Finding the number of related documents for LDA with scikit-learn

I follow the scait-learn LDA example here and try to understand how I can (if possible) draw how many documents have been marked as having each of these topics. I looked through the documents for the LDA model here , but I don’t see where I could get this number. Has anyone been able to do this before using scikit-learn?

+4
source share
1 answer

The LDA computes a list of probabilities of topics for each document, so you can interpret the topic of the document as the topic with the highest probability for this document.

dtm - lda , transform() pandas:

 
docsVStopics = lda.transform(dtm)
docsVStopics = pd.DataFrame(docsVStopics, columns=["Topic"+str(i+1) for i in range(N_TOPICS)])
print("Created a (%dx%d) document-topic matrix." % (docsVStopics.shape[0], docsVStopics.shape[1]))
docsVStopics.head()

:

most_likely_topics = docsVStopics.idxmax(axis=1)

counts:

 most_likely_topics.groupby(most_likely_topics).count()
+1

Source: https://habr.com/ru/post/1627653/


All Articles