Consider a simple CRF that represents the probability of labeling (y) of a given observation (x). Also suppose the probability depends on the \ theta parameter. In the output, you only know x and are trying to make a conclusion about y. What you just do is apply the EM algorithm in such a way that steps E detect the label y (argmax P (y | x, \ theta)), and step M finds the parameter \ theta (argmax P (\ theta | x, y)), M can be achieved using any optimization algorithm since \ theta is not at all high-dimensional (at least not higher than the dimension y). E is simply output from MRF / CRF without a hidden variable, since \ theta is independently optimized in step M. ICM is the algorithm that is used to perform the output. If you need a link, you can just read Murphy's book http://www.cs.ubc.ca/~murphyk/MLbook/ , I think chapter 26 is relevant.
source share