Search for topics in an invisible document through Gensim

I use Gensim to create large-scale modeling of the theme. It’s hard for me to figure out how to identify predicted topics for an invisible (non-indexed) document. For example: I have 25 million documents that I have converted to vectors in the LSA (and LDA) space. Now I want to find out the topics of the new document, let's call it x.

According to the Gensim documentation, I can use:

topics = lsi[doc(x)] 

where doc (x) is the function that converts x to a vector.

The problem, however, is that the above variable, themes, returns a vector. This vector is useful if I compare x with additional documents, because it allows me to find the cosine similarity between them, but I cannot actually return specific words related to x itself.

Am I missing something, or is Gensim not having this opportunity?

Thanks,

EDIT

Larsmans has an answer.

I managed to show topics using:

 for t in topics: print lsi.show_topics(t[0]) 
+6
source share
2 answers

The vector returned by [] in the LSI model is actually a list of pairs (topic, weight) . You can check the topic using the LsiModel.show_topic method

+4
source

I managed to show topics using:

for t by topic: print lsi.show_topics (t [0])

I just wanted to point out a tiny but important error in your solution code: you need to use the show_topic () function, not the show_topic ** s ** () function.

PS I know that this should be published as a comment, not an answer, but my current reputation rating does not yet comment!

0
source

Source: https://habr.com/ru/post/920416/


All Articles