EM Algorithm

General formulation

Recall Jensen's inequality:

Let $f$ be a convex function (i.e. $f^{\prime\prime} \ge 0$) of a random variable X. Then: $f(E[X]) \le E[f(X)]$

And when $f$ is strictly convex, then:

$$E[f(X)] = f(E[X]) \iff X = E[X]$$

with probability 1.

Consider again the joint density $P(x,z|\theta)$, where only $x$ is observed. We want to be able to maximize:

$$\begin{aligned} l(x \,|\, \theta) &= \sum_i \log P(x_i \,|\, \theta) \ &= \sum_i \log \sum_{z_i} P(x_i, z_i \,|\, \theta) \end{aligned}$$

however, evaluating this is difficult when the $\{z_i\}$ are unobserved.

The EM algorithm iteratively calculates lower bounds on the likelihood for the current values of the parameters, then maximizes the lower bound to update the parameters.

Since $z_i$ is a random variable, perhaps we can construct its density $Q_i$ and use it to marginalize the joint likelihood:

$$\sum_i \log \sum_{z_i} P(x_i, z_i \,|\, \theta) = \sum_i \log \sum_{z_i} Q_i(z_i) \frac{P(x_i, z_i \,|\, \theta)}{Q_i(z_i)}$$

This turns the inner summation into an expectation.

$$\sum_i \log \sum_{z_i} Q_i(z_i) \frac{P(x_i, z_i \,|\, \theta)}{Q_i(z_i)} = \sum_i \log E_{Q_i} \left[ \frac{P(x_i, z_i \,|\, \theta)}{Q_i(z_i)} \right]$$

Now, if we apply Jensen's inequality:

$$\begin{aligned} \sum_i \log E_{Q_i} \left[ \frac{P(x_i, z_i \,|\, \theta)}{Q_i(z_i)} \right] &\ge \sum_i E_{Q_i} \log \left[ \frac{P(x_i, z_i \,|\, \theta)}{Q_i(z_i)} \right] \ &= \sum_i \sum_{z_i} Q_i(z_i) \log \left[ \frac{P(x_i, z_i \,|\, \theta)}{Q_i(z_i)} \right] \end{aligned}$$

We need to ensure that the equality condition holds true, which we can do by choosing $Q_i$ appropriately. Specifically, we want a $Q_i$ such that:

$$\frac{P(x_i, z_i \,|\, \theta)}{Q_i(z_i)} = C$$

which implies:

$$Q_i(z_i) \propto P(x_i, z_i \,|\, \theta)$$

Since $Q_i$ is a density,

$$\begin{aligned} Q_i(z_i) &= \frac{P(x_i, z_i \,|\, \theta)}{\sum_{z_i} P(x_i, z_i \,|\, \theta)} \ &= \frac{P(x_i, z_i \,|\, \theta)}{P(x_i \,|\, \theta)} \ &= P(z_i \,|\, x_i, \theta) \end{aligned}$$

Returning to our normal mixture example:

For the E-step we need to identify $Q_i(z_i)$

$$Q_i(z_i) = P(z_i \,|\, x_i, \mu, \sigma, \psi)$$

Via Bayes' formula:

$$P(z_i=j \,|\, x_i) = \frac{P(x_i \,|\, z_i=j)P(z_i=j)}{\sum_l P(x_i \,|\, z_i=l)P(z_i=l)}$$

where $P(x_i \,|\, z_i=l)$ is just the $j$th Normal distribution of the mixture, and $P(z_i=l)$ is a multinomial probability.

This gives us:

$$P(z_i=1 \,|\, x_i) = \frac{\psi N(\mu_b, \sigma_b^2)}{\psi N(\mu_a, \sigma_a^2) + (1-\psi) N(\mu_b, \sigma_b^2)}$$

(recall that we are encoding a=0 and b=1)

References

  • http://nbviewer.ipython.org/github/fonnesbeck/Bios366/blob/master/notebooks/Section3_3-Expectation-Maximization.ipynb
  • https://www.cs.utah.edu/~suyash/Dissertation_html/node9.html