In the index that I create, I am interested in executing a query, and then (using faces) that returns the tile of this query. Here is the analyzer that I use in the text:
{ "settings": { "analysis": { "analyzer": { "shingleAnalyzer": { "tokenizer": "standard", "filter": [ "standard", "lowercase", "custom_stop", "custom_shingle", "custom_stemmer" ] } }, "filter": { "custom_stemmer" : { "type": "stemmer", "name": "english" }, "custom_stop": { "type": "stop", "stopwords": "_english_" }, "custom_shingle": { "type": "shingle", "min_shingle_size": "2", "max_shingle_size": "3" } } } } }
The main problem is that with Lucene 4.4, stop filters no longer support the enable_position_increments
parameter to remove tiles containing stop words. Instead, I get results like ..
red and yellow
"terms": [ { "term": "red", "count": 43 }, { "term": "red _", "count": 43 }, { "term": "red _ yellow", "count": 43 }, { "term": "_ yellow", "count": 42 }, { "term": "yellow", "count": 42 } ]
Naturally, this BIG reduces the number of shingles returned. Is there a post-Lucene 4.4 way to manage this without further processing the results?