Does ElasticSearch use "best fit" ngram terms instead of "synonym"?

Is it possible to tell ElasticSearch to use the “best match” of all grams instead of using grams as synonyms?

By default, ElasticSearch uses grams as synonyms and returns poorly matched documents. It’s better to demonstrate an example, let's say we have two people in the index:

alice wang sarah kerry 

We are looking for ali12345 :

 { query: { bool: { should: { match: { name: 'ali12345' } } } } } 

and he will return alice wang .

How is this possible? Since by default ElasticSearch uses grams as synonyms, therefore , even if only one gram matches, the document will be matched .

If you check the query, you will see that it treats the grams as synonyms

 ... "explanation": { "value": 5.274891, "description": "weight(Synonym(name: ali name:li1 name:i12 name:123 name:234 name:345 ) in 0) [PerFieldSimilarity], result of:", ... 

I wonder if you can tell him to use the “best fit” query to achieve something like:

 { query: { bool: { should: [ { term: { body: 'ali' }}, { term: { body: 'li1' }}, { term: { body: 'i12' }}, { term: { body: '123' }}, { term: { body: '234' }}, { term: { body: '345' }}, ], minimum_should_match: '75%' } } } 

Questions:

  • Perhaps this request can be generated manually, but then you should manually apply ngram parsing and another analyzer pipeline. So I wonder if this can do ElasticSearch ?

  • What is the efficiency of such a query for a long string when there are tens of grams / terms? Will he use some clever optimizations, for example, when searching for similar documents (see more_like_this ) - when he tries to use not all terms, but only terms with the highest tf-idf ?

PS

Index configuration

 { mappings: { object: { properties: { name: { type: 'text', analyzer: 'trigram_analyzer' } } } }, settings: { analysis: { filter: { trigram_filter: { type: 'ngram', min_gram: 3, max_gram: 3 } }, analyzer: { trigram_analyzer: { type: 'custom', tokenizer: 'keyword', filter: [ 'trigram_filter' ] } } } } } 
+5
source share

Source: https://habr.com/ru/post/1273950/


All Articles