Elasticsearch does not return exact match in the first place

I have an elastic search index with a field for exact matches, and somehow I get as many identical results (which I don’t mind), and similar results are sorted to exact matches (what I do mind.)

Can someone explain what is happening and how to fix it?

My mapping looks like this

"exact":{ "type":"string", "boost":10.0, "analyzer":"keyword" }, 

My query looking for "AAPL P JAN 2014 885.00" is as follows:

 { "size" : 21, "query" : { "field" : { "exact" : "AAPL P JAN 2014 885,00" } }, "explain" : true, "sort" : [ { "_score" : { "order" : "desc" } } ], "facets" : { "category" : { "terms" : { "field" : "category", "size" : 10 } } } } 

And the returned documents end in the following order:

  • {"exact": ["APPLE INC", "US0378331005", "AAPL", "73773"], "id-compound": "AAPL"}
  • {"exact": ["AAPL", "73773", "AAPL P JAN 2014 675.00"], "id-compound": "AAPL * PUT * 20140118 * 675"}
  • {"exact": ["AAPL", "73773", "AAPL C JAN 2014 500.00"], "id-compound": "AAPL * CALL * 20140118 * 500"}

etc., with exact match with row results.

Can someone explain to me why the exact match does not end from above?

Search results with full explanation below if this helps to understand things.

 "hits" : [ { "_shard" : 0, "_node" : "1", "_index" : "instruments", "_type" : "instrument", "_id" : "AAPL", "_score" : 1306.8339, "_source" : {"exact":["APPLE INC","US0378331005","AAPL","73773"],"id-compound":"AAPL"}, "_explanation" : { "value" : 1306.8339, "description" : "product of:", "details" : [ { "value" : 6534.169, "description" : "sum of:", "details" : [ { "value" : 6534.169, "description" : "weight(exact:AAPL in 9096), product of:", "details" : [ { "value" : 0.25854474, "description" : "queryWeight(exact:AAPL), product of:", "details" : [ { "value" : 6.1701355, "description" : "idf(docFreq=211, maxDocs=37299)" }, { "value" : 0.0419026, "description" : "queryNorm" } ] }, { "value" : 25272.875, "description" : "fieldWeight(exact:AAPL in 9096), product of:", "details" : [ { "value" : 1.0, "description" : "tf(termFreq(exact:AAPL)=1)" }, { "value" : 6.1701355, "description" : "idf(docFreq=211, maxDocs=37299)" }, { "value" : 4096.0, "description" : "fieldNorm(field=exact, doc=9096)" } ] } ] } ] }, { "value" : 0.2, "description" : "coord(1/5)" } ] } }, { "_shard" : 0, "_node" : "1", "_index" : "instruments", "_type" : "instrument", "_id" : "AAPL*PUT*20140118*675", "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 675,00"],"id-compound":"AAPL*PUT*20140118*675"}, "_explanation" : { "value" : 163.35423, "description" : "product of:", "details" : [ { "value" : 816.7711, "description" : "sum of:", "details" : [ { "value" : 816.7711, "description" : "weight(exact:AAPL in 18), product of:", "details" : [ { "value" : 0.25854474, "description" : "queryWeight(exact:AAPL), product of:", "details" : [ { "value" : 6.1701355, "description" : "idf(docFreq=211, maxDocs=37299)" }, { "value" : 0.0419026, "description" : "queryNorm" } ] }, { "value" : 3159.1094, "description" : "fieldWeight(exact:AAPL in 18), product of:", "details" : [ { "value" : 1.0, "description" : "tf(termFreq(exact:AAPL)=1)" }, { "value" : 6.1701355, "description" : "idf(docFreq=211, maxDocs=37299)" }, { "value" : 512.0, "description" : "fieldNorm(field=exact, doc=18)" } ] } ] } ] }, { "value" : 0.2, "description" : "coord(1/5)" } ] } }, { "_shard" : 0, "_node" : "1", "_index" : "instruments", "_type" : "instrument", "_id" : "AAPL*CALL*20140118*500", "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL C JAN 2014 500,00"],"id-compound":"AAPL*CALL*20140118*500"}, "_explanation" : { "value" : 163.35423, "description" : "product of:", "details" : [ { "value" : 816.7711, "description" : "sum of:", "details" : [ { "value" : 816.7711, "description" : "weight(exact:AAPL in 383), product of:", "details" : [ { "value" : 0.25854474, "description" : "queryWeight(exact:AAPL), product of:", "details" : [ { "value" : 6.1701355, "description" : "idf(docFreq=211, maxDocs=37299)" }, { "value" : 0.0419026, "description" : "queryNorm" } ] }, { "value" : 3159.1094, "description" : "fieldWeight(exact:AAPL in 383), product of:", "details" : [ { "value" : 1.0, "description" : "tf(termFreq(exact:AAPL)=1)" }, { "value" : 6.1701355, "description" : "idf(docFreq=211, maxDocs=37299)" }, { "value" : 512.0, "description" : "fieldNorm(field=exact, doc=383)" } ] } ] } ] }, { "value" : 0.2, "description" : "coord(1/5)" } ] } }, { "_id" : "AAPL*PUT*20140118*940", "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 940,00"],"id-compound":"AAPL*PUT*20140118*940"}, "_explanation" : { "value" : 163.35423, "description" : "product of:", "details" : [ { "value" : 816.7711, "description" : "sum of:", "details" : [ { "value" : 816.7711, "description" : "weight(exact:AAPL in 794), product of:", "details" : [ { "value" : 0.25854474, "description" : "queryWeight(exact:AAPL), product of:", "details" : [ { "value" : 6.1701355, "description" : "idf(docFreq=211, maxDocs=37299)" }, { "value" : 0.0419026, "description" : "queryNorm" } ] }, { "value" : 3159.1094, "description" : "fieldWeight(exact:AAPL in 794), product of:", "details" : [ { "value" : 1.0, "description" : "tf(termFreq(exact:AAPL)=1)" }, { "value" : 6.1701355, "description" : "idf(docFreq=211, maxDocs=37299)" }, { "value" : 512.0, "description" : "fieldNorm(field=exact, doc=794)" } ] } ] } ] }, { "value" : 0.2, "description" : "coord(1/5)" } ] } } 

and only if what happens if I analyze the data I'm trying to save:

 curl -XGET 'localhost:9200/instruments/_analyze?field=exact&pretty=true' -d 'ING P JUN 2013 6.00' { "tokens" : [ { "token" : "ING P JUN 2013 6.00", "start_offset" : 0, "end_offset" : 20, "type" : "word", "position" : 1 } ] 
+6
source share
5 answers

All three documents get exactly the same result, as you can see from the output of the explanation, they all correspond to "AAPL". The term always appears once in documents (tf = 1) and appears on 211 of 37299 documents (idf = 6.1701355). The field norm is much higher since you are using index acceleration (increasing the number in your mapping, 10), anyway, since the match is always in the same field. Simply, if you have a match in other fields, then it will definitely always win, which may make sense in your case.

But the problem is that AAPL P JAN 2014 885,00 not an exact match if I look at your documents. I see that of the 5 terms in your request, there is only one match, which is also confirmed by the coordinate in your explanation: coord (1/5) `.

The keyword analyzer seems to be used, but as you can see from the returned documents, you are not sending the contents of the exact field as a single value, but as an array of values. Each of its elements will not be marked, since you are using a keyword analyzer, but still you have several tokens. I think you need to check how you index documents.

0
source

I'm not sure that technically this is the best thing, but if you only after one specific answer from a search by elasticity, you can just use a filter with a script that looked for an exact match.

 { from : 0, size : 1, "query" : { "text_phrase" : { "title" : "AAPL P JAN 2014 885,00" } }, "filter" : { "script" : { "script" : "_source.exact.contains(x)", "params" : { "x" : "AAPL P JAN 2014 885,00" } } } } 

I used this to get one known entry from elastic search, and it worked for me.

+2
source

I think you found the answer, you just wanted to give a little more information to others with the same problem.

You are using the field query, which is from the elasticsearch documentation:

Field request:

A query that executes a query string for a specific field. This is a simplified version of the query_string query (by setting the default_field field to the field to which this query was made).

I believe the query_string query is for text, i.e.: it does a lot for the query, making it fuzzy, etc.

What you want to use (and I think you found it) is a term query that will do nothing with the search phrase and therefore give you exact matches.

NOTE. The analysis occurs in 2 different times, index time and query time. The setting "analyzer": "keyword" apparently only affects search query queries "when searching using the query string" form elasticsearch docs . I have to admit, I don’t know exactly what this means (I would suggest query_string , but it could also mean for queries like http://../_search?q=exact:{query here} )

+1
source

DO NOT USE your id field.

Define your field as:

 "exact":{ "type":"string", "index":"not_analyzed" } 

See Finding Exact Values

+1
source

The reason your keyword analyzer seems to be ignored in the search query is because the ES blocks this line twice - first it launches its DSL tokenizer, and then it starts the tokenizer specified when matching the result. This is explained in more detail in this article http://paulsabou.com/blog/2012/03/25/advanced-exact-matching-with-elastic-search/

0
source

Source: https://habr.com/ru/post/945179/


All Articles