Elasticsearch: a tile with the removal of stop words

Question

Elasticsearch: a tile with the removal of stop words

I am trying to implement elasticsearch matching to optimize phrase searches in large text. As suggested in this article , I use a pebble filter to create several unigrams per phrase.

Two questions:

In the mentioned article, the stopwatch is filtered out, and the tile removes the missing places by inserting the "_" tokens. These tokens must be excluded from the unigram, which is indexed by the engine. The reason for this elimination is the ability to respond to phrasal requests containing all kinds of "useless" words. The standard solution (as mentioned in the article) is no longer possible, given that Lucene devalues the specific function (enable_position_increments) needed for this behavior. How to solve this problem?
Given the elimination of punctuation, I regularly see unigrams arising from this tile process that span both phrases. From a search perspective, any result containing words from two separate phrases is incorrect. How to avoid (or mitigate) such problems?

+4

full-text-search elasticsearch lucene

lesingerouge Mar 24 '14 at 12:14

source share

No one has answered this question yet.

See similar questions:

6

Using shingles and stop words with Elasticsearch and Lucene 4.4

or similar:

599

Solr vs ElasticSearch

108