Token filters, such as ASCIIFoldingFilter, are based on TokenStream, so they are what Analyzer returns mainly using the following method:
public abstract TokenStream tokenStream(String fieldName, Reader reader);
As you noticed, filters accept TokenStream as input. They act like wrappers or, more correctly, say like decorators for their input. This means that they improve the behavior of the contained TokenStream, performing both their work and the work of the contained input.
Here you can find an explanation here . It is not directly related to the ASCIIFoldingFilter, but the same principle applies. Basically, you create a custom analyzer with something like this in it (stripped down example):
public class CustomAnalyzer extends Analyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, yourSetOfStopWords);
result = new ASCIIFoldingFilter(result);
return result;
}
}
Both TokenFilter and Tokenizer are subclasses of TokenStream .
, , .
user159088