a problem with the analyst, when I sent the code to the analyzer earlier, in fact, the token stream should be at rest for each new text entry that should be marked.
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException { TokenStream stream = (TokenStream) getPreviousTokenStream(); if (stream == null) { stream = new AttachmentNameTokenizer(reader); if (stemmTokens) stream = new SnowballFilter(stream, name); setPreviousTokenStream(stream); // ---------------> problem was here } else if (stream instanceof Tokenizer) { ( (Tokenizer) stream ).reset(reader); } return stream; }
every time I set the previous stream of tokens, the next following text field should be separately marked, it always starts with the final offset of the last token stream, which makes the vector vector offset incorrect for the new stream, now it works fine
ublic TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException { TokenStream stream = (TokenStream) getPreviousTokenStream(); if (stream == null) { stream = new AttachmentNameTokenizer(reader); if (stemmTokens) stream = new SnowballFilter(stream, name); } else if (stream instanceof Tokenizer) { ( (Tokenizer) stream ).reset(reader); } return stream; }
source share