Can SOLR match one-to-one terms with agreed terms?

I am currently working on a people search tool using SOLR to simplify indexing + fuzzy searching across multiple fields (using edismax) using various filters like SynonymFilterFactory, WordDelimiterFactory, etc. and disable TF-IDF.

This works very well, except in a few cases where a search query matches multiple times. For example, a search for “Martin XXXX” returns “Marvin Martin” as the highest result because it matches Martin for both “Marvin” and “Martin”.

Matching a search term to multiple words in a document generally makes a lot of sense. However, in the case of people searching, I would like to add only the maximum score for each search term (i.e. Compare each search term with only one word in the document (person’s name / information)).

Is there a mechanism in SOLR / Lucene that would allow me to force a one-to-one mapping between the search term and the agreed term?

You can see the problem below in the debug request:

0.27641854 = (MATCH) sum of: 0.27641854 = (MATCH) sum of: 0.15077375 = (MATCH) weight(FullName:martin in 118169) [NoTFIDFSimilarityClass], result of: 0.15077375 = score(doc=118169,freq=1.0 = termFreq=1.0 ), product of: 0.15077375 = queryWeight, product of: 1.0 = idf(docFreq=1619, maxDocs=328317) 0.15077375 = queryNorm 1.0 = fieldWeight in 118169, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=1619, maxDocs=328317) 1.0 = fieldNorm(doc=118169) 0.12564479 = (MATCH) weight(FullName:marvin^0.8333333 in 118169) [NoTFIDFSimilarityClass], result of: 0.12564479 = score(doc=118169,freq=1.0 = termFreq=1.0 ), product of: 0.12564479 = queryWeight, product of: 0.8333333 = boost 1.0 = idf(docFreq=105, maxDocs=328317) 0.15077375 = queryNorm 1.0 = fieldWeight in 118169, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=105, maxDocs=328317) 1.0 = fieldNorm(doc=118169) 

Request for example

  http://domain/solr/peoplefinder/select?q=Martin~&wt=json&indent=true&defType=edismax&qf=FullName&stopwords=true&lowercaseOperators=true&debug=true 
+5
source share

Source: https://habr.com/ru/post/1206861/


All Articles