SOLR: create a term vector (e.g. data returned from TermVectorComponent) from source

Using http://wiki.apache.org/solr/TermVectorComponent I can get indexed terms and their frequencies for any document stored in my index. How can I get the same information for text without storing text in my index? I just want SOLR to process the text and return information, but without saving the document in my index.

+4
source share
4 answers

AFAIK is not possible without saving data to SOLR.

If you want to do a text analysis (I understand that it is wider than you ask), I would recommend the following alternatives:

  • MAUI - Keyword and terminology extraction.
  • Gensim - theme modeling
  • Kea - Keyword Retrieval

I also came across some python scripts that analyze frequency analysis. Take a look at Mincemeat , in particular an example that calculates frequency.

+1
source

From what you ask, I came to the conclusion that you really need a search library, not a complete search engine (service). This library is Lucen. Perhaps this will help the starters: How to extract a Term Vector document in Lucene 3.5.0 . You can save the index in RAM to calculate the necessary bits, and then get rid of the index.

+1
source

I wrote an application in Java several years ago that I did heavy text analysis based on Lucene. I had to configure search functions to find words at a certain distance from each other. You can import text documents into the software and count the number of frequencies, or you can take the code and make it according to your needs.

Free download: http://www.minoesoftware.com/download.php

Source: https://github.com/danspiteri/MINOE/blob/master/src/minoe/SearchFiles.java

+1
source

If you use Solr4 and you do not save the text, you can use a Solr pivot in the text box. But then, obviously, you will get conditions after processing the analyzer:

http://192.168.0.202:8080/solr/fr_00_0425_sem/select?q=renault&wt=xml&facet=true&facet.pivot=uniqueKey,yourText 

This is a rather difficult request, I hope you do not have too many documents that match ...

+1
source

Source: https://habr.com/ru/post/1497126/


All Articles