Search for topics in an invisible document through Gensim

Question

Search for topics in an invisible document through Gensim

I use Gensim to create large-scale modeling of the theme. It’s hard for me to figure out how to identify predicted topics for an invisible (non-indexed) document. For example: I have 25 million documents that I have converted to vectors in the LSA (and LDA) space. Now I want to find out the topics of the new document, let's call it x.

According to the Gensim documentation, I can use:

topics = lsi[doc(x)]

where doc (x) is the function that converts x to a vector.

The problem, however, is that the above variable, themes, returns a vector. This vector is useful if I compare x with additional documents, because it allows me to find the cosine similarity between them, but I cannot actually return specific words related to x itself.

Am I missing something, or is Gensim not having this opportunity?

Thanks,

EDIT

Larsmans has an answer.

I managed to show topics using:

 for t in topics: print lsi.show_topics(t[0])

+6

python nlp gensim latent-semantic-indexing

Peter Kirby Jul 13 '12 at 13:22

source share

2 answers

I managed to show topics using:
for t by topic: print lsi.show_topics (t [0])

I just wanted to point out a tiny but important error in your solution code: you need to use the show_topic () function, not the show_topic ** s ** () function.

PS I know that this should be published as a comment, not an answer, but my current reputation rating does not yet comment!

0

Chiraz benabdelkader May 17 '14 at 16:43

source share

Fred foo · Accepted Answer · 2012-07-13T15:36:32+0000

The vector returned by [] in the LSI model is actually a list of pairs (topic, weight) . You can check the topic using the LsiModel.show_topic method

Search for topics in an invisible document through Gensim

More articles: