How to print the full distribution of words in an LDA topic in gensim?

The lda.show_topics module from the following code prints only the distribution of the 10 best words for each topic, how to print the full distribution of all words in the corpus?

 from gensim import corpora, models documents = ["Human machine interface for lab abc computer applications", "A survey of user opinion of computer system response time", "The EPS user interface management system", "System and human system engineering testing of EPS", "Relation of user perceived response time to error measurement", "The generation of random binary unordered trees", "The intersection graph of paths in trees", "Graph minors IV Widths of trees and well quasi ordering", "Graph minors A survey"] stoplist = set('for a of the and to in'.split()) texts = [[word for word in document.lower().split() if word not in stoplist] for document in documents] dictionary = corpora.Dictionary(texts) corpus = [dictionary.doc2bow(text) for text in texts] lda = models.ldamodel.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=2) for i in lda.show_topics(): print i 
+6
source share
3 answers

show_topics() has a variable call topn where you can specify the number of top N words that you need from the word distribution for each topic. see http://radimrehurek.com/gensim/models/ldamodel.html

So, instead of the standard lda.show_topics() . You can use len(dictionary) for complete word distributions for each topic:

 for i in lda.show_topics(topn=len(dictionary)): print i 
+8
source

In show_topics() there are two calls to the variable num_topics and num_words , for num_topics number of topics returns num_words most significant words (10 words per topic, by default). see http://radimrehurek.com/gensim/models/ldamodel.html#gensim.models.ldamodel.LdaModel.show_topics

So you can use len(lda.id2word) for complete word distributions for each topic and lda.num_topics for all topics in your lda model.

 for i in lda.show_topics(formatted=False,num_topics=lda.num_topics,num_words=len(lda.id2word)): print i 
+3
source

The code below will print your words, as well as their probability. I typed the top 10 words. You can change num_words = 10 to print more words for each topic.

 for words in lda.show_topics(formatted=False,num_words=10): print(words[0]) print("******************************") for word_prob in words[1]: print("(",dictionary[int(word_prob[0])],",",word_prob[1],")",end = "") print("") print("******************************") 
0
source

Source: https://habr.com/ru/post/949497/


All Articles