Iterated Conditional Mode E Step EM

I wanted to know what the mathematical rationale for using ICM as an approximation for step E in the EM algorithm is.

As I understand in step E, the idea is to find a distribution equal to the rear distribution of the hidden variable, which ensures that the probability increases or finds the best possible distribution from some simpler family of distributions, which ensures that the lower bound of the likelihood functions increases.

How to mathematically justify the use of ICM at such an E-step? Any links / conclusions / notes would be very helpful.

Thank you for your help.

+4
source share
1 answer

Consider a simple CRF that represents the probability of labeling (y) of a given observation (x). Also suppose the probability depends on the \ theta parameter. In the output, you only know x and are trying to make a conclusion about y. What you just do is apply the EM algorithm in such a way that steps E detect the label y (argmax P (y | x, \ theta)), and step M finds the parameter \ theta (argmax P (\ theta | x, y)), M can be achieved using any optimization algorithm since \ theta is not at all high-dimensional (at least not higher than the dimension y). E is simply output from MRF / CRF without a hidden variable, since \ theta is independently optimized in step M. ICM is the algorithm that is used to perform the output. If you need a link, you can just read Murphy's book http://www.cs.ubc.ca/~murphyk/MLbook/ , I think chapter 26 is relevant.

+1
source

Source: https://habr.com/ru/post/1479826/


All Articles