Solr: scoring for exact matches is higher than partial matches

Question

Solr: scoring for exact matches is higher than partial matches

In a very simple case, I have three documents with the file names "Lark", "Larker" and "Larking" (without a file extension). In solr, I index these three documents by matching the file name in the "title" field. When I do a Lark search, all three documents are returned (this is what I want), but they all get the same score. I would prefer Lark to be the highest since it matches my query exactly, and the rest are behind.

<field name="title" type="text_general" indexed="true" stored="true" multiValued="false"/>

 <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

I believe the reason they get the same score is because EdgeNGramFilterFactory used during the index. Each document is indexed as "La", "Lar", "Lark" with two documents ("Larker" and "Larking"), indexed with some additional options. Thus, each document is an exact match for the Lark request. I would like to somehow fulfill a query in which the term “Lark” will return all three documents, but with a document called “Lark”, which will be returned above the others.

Request debugging results:

 <lst name="debug"> <str name="rawquerystring">Lark</str> <str name="querystring">Lark</str> <str name="parsedquery">text:lark</str> <str name="parsedquery_toString">text:lark</str> <lst name="explain"> <str name="543d6ee4cbb33c26bbcf288b/xxnullxx/543d6ef9cbb33c26bbcf2892"> 2.7104912 = (MATCH) weight(text:lark in 0) [DefaultSimilarity], result of: 2.7104912 = fieldWeight in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 3.8332133 = idf(docFreq=3, maxDocs=68) 0.5 = fieldNorm(doc=0) </str> <str name="543d6ee4cbb33c26bbcf288b/xxnullxx/543d6ef9cbb33c26bbcf2893"> 2.7104912 = (MATCH) weight(text:lark in 1) [DefaultSimilarity], result of: 2.7104912 = fieldWeight in 1, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 3.8332133 = idf(docFreq=3, maxDocs=68) 0.5 = fieldNorm(doc=1) </str> <str name="543d6ee4cbb33c26bbcf288b/xxnullxx/543d6ef9cbb33c26bbcf2894"> 2.7104912 = (MATCH) weight(text:lark in 2) [DefaultSimilarity], result of: 2.7104912 = fieldWeight in 2, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 3.8332133 = idf(docFreq=3, maxDocs=68) 0.5 = fieldNorm(doc=2) </str>

+6

java lucene solr solrj

Mike nitchie Oct 14 '14 at 15:51

source share

2 answers

It may be late, but you can also use KeywordRepeatFilterFactory without creating a new field. Here is how the Solr documentation describes that:

Repeatable question: "How can I make an initial contribution? More than the original version?" In Solr 4.3, the keyword SearchFilterFactory is added to help functionality. This filter issues two tokens for each input token, one of which is marked with the Keyword attribute. Stermers who respect the attributes of keywords will go through the token, marked without change. Thus, the effect of this filter is to index both the original word and the original version.

0

alexf Jul 20 '15 at 16:02

source share

Yann · Accepted Answer · 2014-10-15T11:34:34+0000

To increase exact matches, you can create a new field called "exact_title" with a new type of "text_exact" that does not have EdgeNGramFilterFactory.

In your schema, you can use the line:

 <copyField source="title" dest="exact_title"/>

to copy the title to exact_title.

Then run your query regarding both fields, header and exact_title. If the request matches the exact title, the document with this exact title will receive a higher score than other documents and will rise up.

Solr: scoring for exact matches is higher than partial matches

More articles: