Lucene.Net - How to handle a spatially separated phrase as a single token?

I used a search tool using Lucene.Net. The index includes UK academic qualifications, including A Level.

I would like users to be able to search using the phrase "A Level", but using the standard analyzer, "A" is deleted as a stop word, and therefore only the "Level" is indexed / search is performed.

What is my best way around this? I assume that I need to somehow designate "A Level" to "A-Level" or similar, creating my own analyzer.

Is this a better approach?

Changes:

Notice that I want the entire search to not be a phrase query. that is, in my search box, I want the user to be able to enter <"A Level" and English mathematics physics>, and this will return any of "A Level" and any of English mathematics or physics. The question has been updated to reflect this.

I would like the word "A" to be used in all cases as the stop word from "A Level"

The phrase "A Level" is not in its own specific field, it is in a free text field, which may include a phrase.

+3
source share
5 answers

Use PhraseQuery, which can be combined with any other using a boolean construct

EDITED

. (, , )

 BooleanQuery rootQuery = new ...
 PhraseQuery q1 = new PhraseQuery("A Level");
 TermQuery q2 = new TermQuery("English");
 TermQuery q3 = new TermQuery("Maths");
 TermQuery q4 = new TermQuery("Physics");
 rootQuery.Add(q1, BooleanClause.Occur.SHOULD); //or MUST - depends on you
 rootQuery.Add(q2, BooleanClause.Occur.SHOULD); 
 rootQuery.Add(q3, BooleanClause.Occur.SHOULD); 
 rootQuery.Add(q4, BooleanClause.Occur.SHOULD); 
+3

, Lucene. , , . , , , , , .

+2

- ? StandardAnalyzer, - ( ):

Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29, new Hashtable());

. QueryParser ( ) :

        // Phrase query
        PhraseQuery phraseQuery = new PhraseQuery();
        phraseQuery.Add(new Term("MyField", "A"));
        phraseQuery.Add(new Term("MyField", "Level"));

        // Or query
        BooleanQuery orQuery = new BooleanQuery();
        orQuery.Add(new BooleanClause(new TermQuery(new Term("MyField", "English")), BooleanClause.Occur.SHOULD));
        orQuery.Add(new BooleanClause(new TermQuery(new Term("MyField", "Maths")), BooleanClause.Occur.SHOULD));
        orQuery.Add(new BooleanClause(new TermQuery(new Term("MyField", "Physics")), BooleanClause.Occur.SHOULD));

        // Main query
        BooleanQuery query = new BooleanQuery();
        query.Add(phraseQuery, BooleanClause.Occur.MUST);
        query.Add(orQuery, BooleanClause.Occur.MUST);

Bye

+1

KeywordAnalyzer , StandardAnalyzer. , .net - , .

- ( , Java ):

private ReusableAnalyzer getReusableAnalyzer(String fieldName, Reader reader) {
    boolean phrase = treatAsPhrase(fieldName);
    ReusableAnalyzer ra = new ReusableAnalyzer();
    TokenStream result = phrase ? new KeywordTokenizer(reader) : new StandardTokenizer(version, reader);

, , "" .

+1

Lucene .

1) , -. , StandardAnalyzer, - .

public class PreserveStopWordsAnalyzer : StandardAnalyzer
{
    public PreserveStopWordsAnalyzer() : base(Version.LUCENE_29, new Hashtable())
    {}
}

2) "-". :

+RegularField:English +StopWordField:"A Level"

+1

Source: https://habr.com/ru/post/1785400/


All Articles