Using Stop Words with WhitespaceAnalyzer

Question

Using Stop Words with WhitespaceAnalyzer

Lucene StandardAnalyzer removes points from a string / abbreviations when indexing. I want Lucene to preserve the points, and so I use the WhitespaceAnalyzer class.

I can pass the stop word list to StandardAnalyzer ... but how can I pass it to WhitespaceAnalyzer?

Thanks for reading.

+3

lucene lucene.net

Steve chapman May 08, '09 at 17:39

source share

1 answer

Shashikant Kore · Accepted Answer · 2009-05-08T19:20:33+0000

Create your own parser by extending WhiteSpaceAnalyzer and overriding tokenStream as follows.

public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = super.tokenStream(fieldName, reader);
    result = new StopFilter(result, stopSet);
    return result;
}

Here stopSet is a set of stop words that you can get by adding a constructor to your analyzer that accepts a list of stop words.

reusableTokenStream() , TokenStream.

Using Stop Words with WhitespaceAnalyzer

More articles: