I use Gensim to create large-scale modeling of the theme. Itβs hard for me to figure out how to identify predicted topics for an invisible (non-indexed) document. For example: I have 25 million documents that I have converted to vectors in the LSA (and LDA) space. Now I want to find out the topics of the new document, let's call it x.
According to the Gensim documentation, I can use:
topics = lsi[doc(x)]
where doc (x) is the function that converts x to a vector.
The problem, however, is that the above variable, themes, returns a vector. This vector is useful if I compare x with additional documents, because it allows me to find the cosine similarity between them, but I cannot actually return specific words related to x itself.
Am I missing something, or is Gensim not having this opportunity?
Thanks,
EDIT
Larsmans has an answer.
I managed to show topics using:
for t in topics: print lsi.show_topics(t[0])
source share