How to use StandardAnalyzer with TermQuery?

Question

How to use StandardAnalyzer with TermQuery?

I am trying to create something similar to what QueryParser does in lucene, but without a parser, i.e. run the line through StandardAnalyzer, tokenize this and use TermQuery: s in BooleanQuery to create a query. My problem is that I only get tokens: s from StandardAnalyzer, not Term: s. I can convert the token into a term simply by extracting a string from it using Token.term (), but this is 2.4.x-only, and it seems the opposite, because I need to add a field a second time. What is the correct way to create TermQuery using StandardAnalyzer?

I am using pylucene, but I think the answer is the same for Java, etc. Here is the code I came up with:

from lucene import *
def term_match(self, phrase):
    query = BooleanQuery()
    sa = StandardAnalyzer()               
    for token in sa.tokenStream("contents", StringReader(phrase)):
        term_query = TermQuery(Term("contents", token.term())
        query.add(term_query), BooleanClause.Occur.SHOULD)

+3

python lucene pylucene

Joakim lundborg Sep 7 '09 at 16:26

source share

2 answers

I ran into the same problem, and using the Lucene 2.9 API and Java, my code snippet looks like this:

final TokenStream tokenStream = new StandardAnalyzer(Version.LUCENE_29)
    .tokenStream( fieldName , new StringReader( value ) );
final List< String > result = new ArrayList< String >();
try {
while ( tokenStream.incrementToken() ) {
  final TermAttribute term = ( TermAttribute ) tokenStream.getAttribute( TermAttribute.class );
  result.add( term.term() );
}

0

Daniel hiller Nov 24 '10 at 15:26

source share

Richiehindle · Accepted Answer · 2009-09-07T18:28:22+0000

An established way to get token text with token.termText()- this API was forever.

And yes, you need to specify the field name for Analyzerboth and Term; I think this is considered normal. 8 -)

How to use StandardAnalyzer with TermQuery?

More articles: