The LDA computes a list of probabilities of topics for each document, so you can interpret the topic of the document as the topic with the highest probability for this document.
dtm
- lda
, transform()
pandas
:
docsVStopics = lda.transform(dtm)
docsVStopics = pd.DataFrame(docsVStopics, columns=["Topic"+str(i+1) for i in range(N_TOPICS)])
print("Created a (%dx%d) document-topic matrix." % (docsVStopics.shape[0], docsVStopics.shape[1]))
docsVStopics.head()
:
most_likely_topics = docsVStopics.idxmax(axis=1)
counts:
most_likely_topics.groupby(most_likely_topics).count()