Finding the number of related documents for LDA with scikit-learn

Question

Finding the number of related documents for LDA with scikit-learn

I follow the scait-learn LDA example here and try to understand how I can (if possible) draw how many documents have been marked as having each of these topics. I looked through the documents for the LDA model here , but I don’t see where I could get this number. Has anyone been able to do this before using scikit-learn?

+4

scikit-learn lda

user139014 Feb 07 '16 at 11:15

source share

1 answer

Patrizio Giovannotti · Answer 1 · 2018-04-18T15:01:29+0000

The LDA computes a list of probabilities of topics for each document, so you can interpret the topic of the document as the topic with the highest probability for this document.

dtm - lda , transform() pandas:

docsVStopics = lda.transform(dtm)
docsVStopics = pd.DataFrame(docsVStopics, columns=["Topic"+str(i+1) for i in range(N_TOPICS)])
print("Created a (%dx%d) document-topic matrix." % (docsVStopics.shape[0], docsVStopics.shape[1]))
docsVStopics.head()

:

most_likely_topics = docsVStopics.idxmax(axis=1)

counts:

 most_likely_topics.groupby(most_likely_topics).count()

Finding the number of related documents for LDA with scikit-learn

More articles: