LDA and thematic model

I studied the LDA and topic model for several weeks. But due to my poor mathematical ability, I cannot fully understand its internal algorithms. I used the GibbsLDA implementation, entered a lot of documents and set the topic number to 100, I got a file called "final.theta" that stores the theme share of each topic in each document. This result is good, I can use proportion to do many other things. But when I tried the implementation of the Blei C language on LDA, I only got the final.gamma file, but I don’t know how to convert this file to the theme proportion style. Can someone help me. And I found out that the LDA model has a lot of improved version (e.g. CTM, HLDA), if I can find a theme model similar to LDA, I mean when I enter a lot of documents, it can directly display the theme share in documents, Thank you very much!

+4
source share
2 answers

I think the problem with the Blei implementation is that you make variational output by running:

$ lda inf [args ...]

If you want to make an assessment by using:

$ lda est [args ...]

After that, the final.beta file will exist in the current directory or in the directory indicated by the optional last argument. Then you run the python script "themes.py" included in tar. Here it reads here: http://www.cs.princeton.edu/~blei/lda-c/readme.txt , especially sections B and D.

(If this still doesn't make sense, let me know)

Regarding enhancements such as CTM, etc .: I don't know anything about HLDA, but in the past I used LDA and CTM, and I can say that none of them are better than the other - this case is better for different data. CTM makes the assumption that the documents are correlated, and uses this assumption to improve results as long as it is true.

Hope this helps!

+1
source

To get E [? Theta;], simply normalize the gamma inside each line. This follows from the properties of the Dirichlet distribution.

0
source

Source: https://habr.com/ru/post/1400209/


All Articles