Elasticsearch Query Performance

I use elasticsearch to index two types of objects -

Data Details

Contract object ~ 60 properties (object size - 120 bytes) Risk object object ~ 125 properties (object size - 250 bytes)

The contract is the parent of the risk element (_parent)

I store 240 million such objects in one index (210 million risk units, 30 million contracts)

Index Size - 322 gb

Cluster information

11 m2.4x.large EC2 boxes [68 GB memory, 1.6 TB storage, 8 cores] (1 field is a node load balancer with node.data = false) 50 fragments 1 replica

elasticsearch.yml

node.data: true http.enabled: false index.number_of_shards: 50 index.number_of_replicas: 1 index.translog.flush_threshold_ops: 10000 index.merge.policy.use_compound_files: false indices.memory.index_buffer_size: 30% index.refresh_interval: 30s index.store.type: mmapfs path.data: /data-xvdf,/data-xvdg 

I start elasticsearch nodes with the following command: /home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g

My problem is that I execute the following query on the type of risk element and it takes about 10-15 seconds to return data for 20 records.

I run this with a load of 50 simultaneous users and a massive index load of about 5,000 risk elements that occur in parallel.

Request (with parent attachment)

Http: //: 9200 / contractindex / riskitem / _search *

 { "query": { "has_parent": { "parent_type": "contract", "query": { "range": { "ContractDate": { "gte": "2010-01-01" } } } } }, "filter": { "and": [{ "query": { "bool": { "must": [{ "query_string": { "fields": ["RiskItemProperty1"], "query": "abc" } }, { "query_string": { "fields": ["RiskItemProperty2"], "query": "xyz" } }] } } }] } } 

Single table queries

Query1 (This query takes about 8 seconds.)

  <!-- language: lang-json --> { "query": { "constant_score": { "filter": { "and": [{ "term": { "CommonCharacteristic_BuildingScheme": "BuildingScheme1" } }, { "term": { "Address_Admin2Name": "Admin2Name1" } }] } } } } **Query2** (This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it) <!-- language: lang-json --> { "query": { "constant_score": { "filter": { "and": [{ "term": { "Insurer": "Insurer1" } }, { "term": { "Status": "Status1" } }] } } } } 

Can someone please help me on how I can improve the performance of this request?

+6
source share
2 answers

Have you tried to configure routing? Without special routing, your request should look in all 50 shards for your request. With custom routing, your query knows which shards to look for, making queries more efficient. More details here .

You can assign custom routing for each mass element by specifying the routing value in the _routing field, as described in bulk api docs .

+3
source

We made changes using bits.

Within an hour, we started 50 concurrent users (read-only). All our queries are executed 4-5 times faster, except for the parent child query (the question in question), it decreased from 7 seconds to 3 seconds.

I have another request with has_child. Anyone has any other feedback, we can further improve this or other requests.

 { "query": { "filtered": { "query": { "bool": { "must": [{ "match": { "LineOfBusiness": "LOBValue1" } }] } }, "filter": { "has_child": { "type": "riskitem", "filter": { "bool": { "must": [{ "term": { "Address_Admin1Name": "Admin1Name1" } }] } } } } } } } 
+1
source

Source: https://habr.com/ru/post/951850/


All Articles