I am currently working on a people search tool using SOLR to simplify indexing + fuzzy searching across multiple fields (using edismax) using various filters like SynonymFilterFactory, WordDelimiterFactory, etc. and disable TF-IDF.
This works very well, except in a few cases where a search query matches multiple times. For example, a search for “Martin XXXX” returns “Marvin Martin” as the highest result because it matches Martin for both “Marvin” and “Martin”.
Matching a search term to multiple words in a document generally makes a lot of sense. However, in the case of people searching, I would like to add only the maximum score for each search term (i.e. Compare each search term with only one word in the document (person’s name / information)).
Is there a mechanism in SOLR / Lucene that would allow me to force a one-to-one mapping between the search term and the agreed term?
You can see the problem below in the debug request:
0.27641854 = (MATCH) sum of: 0.27641854 = (MATCH) sum of: 0.15077375 = (MATCH) weight(FullName:martin in 118169) [NoTFIDFSimilarityClass], result of: 0.15077375 = score(doc=118169,freq=1.0 = termFreq=1.0 ), product of: 0.15077375 = queryWeight, product of: 1.0 = idf(docFreq=1619, maxDocs=328317) 0.15077375 = queryNorm 1.0 = fieldWeight in 118169, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=1619, maxDocs=328317) 1.0 = fieldNorm(doc=118169) 0.12564479 = (MATCH) weight(FullName:marvin^0.8333333 in 118169) [NoTFIDFSimilarityClass], result of: 0.12564479 = score(doc=118169,freq=1.0 = termFreq=1.0 ), product of: 0.12564479 = queryWeight, product of: 0.8333333 = boost 1.0 = idf(docFreq=105, maxDocs=328317) 0.15077375 = queryNorm 1.0 = fieldWeight in 118169, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=105, maxDocs=328317) 1.0 = fieldNorm(doc=118169)
Request for example
http://domain/solr/peoplefinder/select?q=Martin~&wt=json&indent=true&defType=edismax&qf=FullName&stopwords=true&lowercaseOperators=true&debug=true