Using Stop Words with WhitespaceAnalyzer

Lucene StandardAnalyzer removes points from a string / abbreviations when indexing. I want Lucene to preserve the points, and so I use the WhitespaceAnalyzer class.

I can pass the stop word list to StandardAnalyzer ... but how can I pass it to WhitespaceAnalyzer?

Thanks for reading.

+3
source share
1 answer

Create your own parser by extending WhiteSpaceAnalyzer and overriding tokenStream as follows.

public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = super.tokenStream(fieldName, reader);
    result = new StopFilter(result, stopSet);
    return result;
}

Here stopSet is a set of stop words that you can get by adding a constructor to your analyzer that accepts a list of stop words.

reusableTokenStream() , TokenStream.

+6

Source: https://habr.com/ru/post/1707959/


All Articles