Abstract

EM is an algorithm that enables parameter estimation in probabilistic models with incomplete data.

Introduction

Background

  • maximum-likelihood estimation(MLE): $argmax_\theta L(\theta)=argmax_\theta \prod_{i=1}^n p(x_i;\theta_i)$
  • Jesen-inequality: for a concave function f(x), $f(E(x))>=E(f(x))$;equal when x is a constant
  • GMM:$ p ( \mathrm { x } ) = \sum _ { i = 1 } ^ { k } w _ { i } N _ { i } \left( \mathrm { x } ; \mu _ { i } , \Sigma _ { i } \right )$ s.t. $\sum _ { i = 1 } ^ { k } w _ { i } = 1$

Problem Settings

We have a distribution $p(x;\theta)$, then sample from it, get ($x_i,z_i$);$x$ is observable variable, while $y$ is hidden variable which can not be observed.Our task is estimating the $\theta$ from samples ${x_i}$.Use MLE construct log-likelihood function: $$\begin{aligned} L ( \boldsymbol { \theta } ) & = \sum _ { i = 1 } ^ { 1 } \ln p \left( \mathbf { x } _ { i } ; \boldsymbol { \theta } \right ) \ & = \sum _ { i = 1 } ^ { 1 } \ln \sum _ { z } p \left( \mathbf { x } _ { i } , z _ { i } ; \boldsymbol { \theta } \right ) \end{aligned}$$ But the problem is we cannot observe hidden variable $z_i$.

Algorithm

If $z_i \thicksim Q_i$, then $$\begin{array} { l } { \sum _ { z } p ( z ) = 1 } \ { p ( z ) \geq 0 } \end{array}$$

Then, $$\begin{aligned} \sum _ { i = 1 } ^ { l } \ln p \left( \mathbf { x } _ { i } ; \mathbf { \theta } \right ) = & \sum _ { i = 1 } ^ { l } \ln \sum _ { z _ { i } } p \left( \mathbf { x } _ { i } , z _ { i } ; \boldsymbol { \theta } \right ) \ & = \sum _ { i = 1 } ^ { l } \ln \sum _ { z _ { i } } Q _ { i } \left( z _ { i } \right ) \frac { p \left( \mathbf { x } _ { i } , z _ { i } ; \boldsymbol { \theta } \right ) } { Q _ { i } \left( z _ { i } \right ) } \ & \geq \sum _ { i = 1 } ^ { l } \sum _ { z _ { i } } Q _ { i } \left( z _ { i } \right ) \ln \frac { p \left( \mathbf { x } _ { i } , z _ { i } ; \boldsymbol { \theta } \right ) } { Q _ { i } \left( z _ { i } \right ) } \end{aligned}$$

So we get the lower bound of log-likelihood function which can be easily diffrentiated.This means we can maxmize the lower bound easily.

Details

while $\Delta >\epsilon${ # E-Step $Q _ { i } \left( z _ { i } \right ) = p \left( z _ { i } | \mathrm { x } _ { i } ; \boldsymbol { \theta } \right )=\frac { p \left( \mathbf { x } _ { i } , z _ { i } ; \boldsymbol { \theta } \right ) } { \sum _ { z } p \left( \mathbf { x } _ { i } , z ; \mathbf { \theta } \right ) }$ # M-Step $L(\theta)=\sum _ { i } \sum _ { z _ { i } } Q _ { i } \left( z _ { i } \right ) \ln \frac { p \left( \mathbf { x } _ { i } , z _ { i } ; \theta \right ) } { Q _ { i } \left( z _ { i } \right ) }$ $\theta^{t+1} = \arg \max _ { \theta }L(\theta)$ $\Delta= L(\theta^{t+1})-L(\theta^t)$ }

Coin-filpping experiment

Fristly, we assign $\theta_A$ and $\theta_B$, then calculate the hidden variable distribution based on $\theta_A$, $theta_B$ and the samples.After that we maximize the lower bound of log-likelihood function, get new $\theta_A$ and $\theta_B$. After 10 runs, $\theta_A=0.80$ and $\theta_B=0.58$

References

  • Do, C. B., & Batzoglou, S. (2008). What is the expectation maximization algorithm?. Nature biotechnology, 26(8), 897.
  • https://www.zhihu.com/question/27976634/answer/153567695