It is sometimes referred to as the likelihood of the data, and sometimes referred to as a statistical model. The difference is whether we are looking at \(p(x | \theta)\) as…

a function of \(x\), where \(\theta\) is known πŸ”—

If \(\theta\) is a known model parameter, then \(p_x(x|\theta) = p(x; \theta) = p_\theta(x)\) is the probability of \(x\) according to a model parameterized by \(\theta\), also known as a model/statistical model/observation model measuring uncertainty about \(x\) given \(\theta\).

(If \(\theta\) is a known random variable, \(p(x|\theta)\) is just a conditional probability, \(\frac{p(x, \theta)}{p(\theta)}\).)

a function of \(\theta\), where \(x\) is known πŸ”—

Unlike the above, the emphasis is on investigating the unknown \(\theta\).

\(p(x|\theta)\) is the probability of some observed data \(x\), that resulted from the random variable \(\theta\) taking on different values.

When doing MLE to find the assignment \(\hat{\theta}\) for \(\theta\) that maximizes likelihood \(p(x|\theta)\), \(p(x|\hat{\theta})\) is also called the maximum likelihood of \(\theta\) given \(x\), \(\mathcal L(\hat\theta|x)\).

In other words, it’s a function of \(\theta\) (written more explicitly as \(p_\theta(x|\theta)\)) that measures the extent to which observed \(x\) supports particular values of \(\theta\) in a parametric model.