How to configure solr / lucene to perform distance search in levenshtein?

Question

How to configure solr / lucene to perform distance search in levenshtein?

I have a long list of words that I entered into a very simple SOLR / Lucene database. my goal is to find “similar” words from a list for one-time queries, where “similarity” is specifically understood as a (damer) levensthein edit distance. I understand that SOLR provides such a distance for spelling suggestions.

in my SOLR schema.xml, I configured the field type string:

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>

which I use to define the field

<field name='term' type='string' indexed='true' stored='true' required='true'/>

I want to find this field and return the results according to their levenshtein edit distance. however, when I run a type query webspace~0.1regarding SOLR with debugging and explanations, the report shows that a number of considerations were made when calculating the ratings, for example:

"1582":"
1.1353534 = (MATCH) sum of:
  1.1353534 = (MATCH) weight(term:webpage^0.8148148 in 1581), product of:
    0.08618848 = queryWeight(term:webpage^0.8148148), product of:
      0.8148148 = boost
      13.172914 = idf(docFreq=1, maxDocs=386954)
      0.008029869 = queryNorm
    13.172914 = (MATCH) fieldWeight(term:webpage in 1581), product of:
      1.0 = tf(termFreq(term:webpage)=1)
      13.172914 = idf(docFreq=1, maxDocs=386954)
      1.0 = fieldNorm(field=term, doc=1581)

it is clear that for my application, the terms frequency, idfetc. meaningless, since each document contains only one term. I tried to use the spelling suggestion component, but could not get it to return the actual similarity ratings.

- , SOLR levensthein/jaro-winkler/n-gram , tf, idf, boost ? SOLR -? , .

+3

lucene solr levenshtein-distance

flow 01 . '10 15:39

3

Karl Johansson · Answer 1 · 2010-09-01T18:09:57+0000

, levenshtein, strdist:

q=term:webspace~0.1&sort=strdist("webspace", term, edit) desc

Mikos · Answer 2 · 2010-09-02T01:16:00+0000

Solr/Lucene, , . , , . SimMetrics. - , .. -, ..

ilinca · Answer 3 · 2014-11-24T15:59:37+0000

SOLR levensthein/jaro-winkler/n-gram tf, idf, boost ?

, , .

q={!func}strdist("webspace",term,edit) , q={!func}strdist("webspace",term,jw) -.

, , , , , , . , .

, , .

How to configure solr / lucene to perform distance search in levenshtein?

More articles: