How can I improve Elasticsearch performance for sorted and geographically filtered queries across large datasets?

I have a large dataset of relatively short documents that include, among other fields, given name, last name, tenantId, geographic location, and skill set.

We have approximately 7 million records distributed across three nodes, and things look unbearably slow (of the order of ten seconds) when searching for a term with a decent amount of matches. Usually we will sort our result set alphabetically by name, in chronological order by createdDate, or by urgency. We also need to calculate deadlines and results. We use REST api to communicate with ES.

I read that sorting can be a major search bottleneck; What are some of the strategies that worked in production to address this kind of requirement?

I am using a mapping similar to the following:

"candidate": { "dynamic":"true", "properties": { "accountId": { "type": "string", "store": "true", "index": "not_analyzed" }, "tenant": { "type": "string", "store": "true", "index": "not_analyzed" }, "givenName": { "type": "string", "store": "true", "index":"analyzed", "analyzer":"sortable", "term_vector" : "with_positions_offsets" }, ... "locations": { "properties": { "name": { "type": "string", "store": "true", "index": "analyzed", "term_vector" : "with_positions_offsets" }, "point": { "type" : "geo_point", "store": "true", "lat_lon":"true" } } }, "skills": { "type": "string", "store": "true", "index": "analyzed", "term_vector" : "with_positions_offsets" }, "createdDate": { "type": "long", "store": "true", "index": "not_analyzed" }, "updatedDate": { "type": "long", "store": "true", "index": "not_analyzed" } } 

And the queries are structured as follows:

 { "from" : 0, "size" : 40, "query" : { "bool" : { "must" : { "bool" : { "should" : [ { "multi_match" : { "query" : "query text", "fields" : [ "givenName", "familyName", "email", "locations.name", "skills"], "type" : "cross_fields" } }, { "prefix" : { "email" : { "prefix" : "query text" } } } ] } } } }, "post_filter" : { "bool" : { "must" : { "geo_polygon" : { "point" : { "points" : [ [ -75.06681499999999, 40.536544 ], ... many more long/lat points ... [ -75.06681499999999, 40.536544 ] ] } } } } }, "sort" : [ { "createdDate" : { "order" : "asc" } } ], "highlight" : { "fields" : { "givenName" : { }, "familyName" : { }, "email" : { }, "locations.name" : { }, "skills" : { } } } } 

Is there some kind of range-based query solution that others have found useful to solve similar sorting / search requirements?

+5
source share

Source: https://habr.com/ru/post/1200448/


All Articles