Lucene.NET: Retrieving All Terms Used in a Specific Document

Is there a way to repeat all terms that belong to a specific document in the Lucene.NET index?

Basically, I want to get a document from an index based on its identifier, and then find the frequency with which each term is used in this document. Does anyone know a way to do this?

I can find the number of documents that match a specific term, but not the terms contained in a particular document.

Many thanks,

Tim

+4
source share
1 answer

In Lucene Java, at least one of the options is when indexing a document stores the term frequency vector. The term frequency vector is simply a list of all members in a given field of a document and how often each of these terms is used. Getting the term’s frequency vector at runtime involves invoking a method in IndexReader with the Lucene identifier of the document in question.

+3
source

Source: https://habr.com/ru/post/1303542/


All Articles