Is there a fast, accurate Highlighter for Lucene?

I have been using (Java) Highlighter for Lucene (in the Sandbox package) for some time. However, this is not very accurate when it comes to matching the correct terms in the search results - it works well for simple queries, for example, searching for two separate words will highlight both code fragments in the results.

However, it does not work with more complex queries. In the simplest case, phrasal requests such as Stack Overflow will match all Stack or Overflow occurrences in the backlight, giving the user the impression that it does not work very well.

I tried to apply the fix here , but this has to do with a lot of performance caveats, and at the end of the day was just unusable. Performance is especially important for wildcard queries. This is due to how the backlight works; instead of just working on a querystring and the text that he analyzes, since Lucene will be, and then looks for all the matches that Lucene made; unfortunately, this means that for certain wildcard queries, it can look for matches with 2000+ sentences on large documents, and that just isn't fast enough.

Is there a faster implementation of an accurate marker?

+3
source share
3 answers

Solr. http://lucene.apache.org/solr

Solr - , Lucene . , Solr API Solr. , Solr .

+1

I read on this subject and stumbled upon spanQuery , which would return you the span of a matching term or terms in the corresponding field.

+1
source

Source: https://habr.com/ru/post/1697043/


All Articles