Finding TokenStream fields in Lucene

Question

Finding TokenStream fields in Lucene

I am just starting with Lucene, and I feel that I should have a fundamental misunderstanding of this, but from the samples and documentation I could not understand this problem.

I can't get Lucene to return results for fields initialized with TokenStream , while fields initialized with string work fine. I am using Lucene.NET 2.9.2 RC2.

[Edit] I also tried this with the latest version of Java (3.0.3) and see the same behavior, so this is not some kind of port quirk.

Here is an example:

 Directory index = new RAMDirectory(); Document doc = new Document(); doc.Add(new Field("fieldName", new StandardTokenizer(new StringReader("Field Value Goes Here")))); IndexWriter iw = new IndexWriter(index, new StandardAnalyzer()); iw.AddDocument(doc); iw.Commit(); iw.Close(); Query q = new QueryParser("fieldName", new StandardAnalyzer()).Parse("value"); IndexSearcher searcher = new IndexSearcher(index, true); Console.WriteLine(searcher.Search(q).Length());

(I understand that this uses APIs that are deprecated since 2.9, but it's just for brevity ... pretend there are arguments that indicate the version, and I'm using one of the new Search s).

This does not return results.

However, if I replace the line that adds the field with

 doc.Add(new Field("fieldName", "Field Value Goes Here", Field.Store.NO, Field.Index.ANALYZED));

then the query returns a hit, as you would expect. It also works if I use the version of TextReader .

Both fields are indexed and labeled, with (I think) the same tokenizer / analyzer (I also tried the others), and none of them are stored, so my intuition is that they should behave the same way. What am I missing?

+4

c # .net lucene lucene.net

Jacob Mar 01 '11 at 0:06

source share

1 answer

Jacob · Accepted Answer · 2011-03-01T16:58:20+0000

I found the answer as a cover.

the token stream created by StandardAnalyzer has a LowerCaseFilter , and when creating a StandardTokenizer , such a filter is not directly applied.

Finding TokenStream fields in Lucene

More articles: