I use elasticsearch to index two types of objects -
Data Details
Contract object ~ 60 properties (object size - 120 bytes) Risk object object ~ 125 properties (object size - 250 bytes)
The contract is the parent of the risk element (_parent)
I store 240 million such objects in one index (210 million risk units, 30 million contracts)
Index Size - 322 gb
Cluster information
11 m2.4x.large EC2 boxes [68 GB memory, 1.6 TB storage, 8 cores] (1 field is a node load balancer with node.data = false) 50 fragments 1 replica
elasticsearch.yml
node.data: true http.enabled: false index.number_of_shards: 50 index.number_of_replicas: 1 index.translog.flush_threshold_ops: 10000 index.merge.policy.use_compound_files: false indices.memory.index_buffer_size: 30% index.refresh_interval: 30s index.store.type: mmapfs path.data: /data-xvdf,/data-xvdg
I start elasticsearch nodes with the following command: /home/ec2-user/elasticsearch-0.90.2/bin/elasticsearch -f -Xms30g -Xmx30g
My problem is that I execute the following query on the type of risk element and it takes about 10-15 seconds to return data for 20 records.
I run this with a load of 50 simultaneous users and a massive index load of about 5,000 risk elements that occur in parallel.
Request (with parent attachment)
Http: //: 9200 / contractindex / riskitem / _search *
{ "query": { "has_parent": { "parent_type": "contract", "query": { "range": { "ContractDate": { "gte": "2010-01-01" } } } } }, "filter": { "and": [{ "query": { "bool": { "must": [{ "query_string": { "fields": ["RiskItemProperty1"], "query": "abc" } }, { "query_string": { "fields": ["RiskItemProperty2"], "query": "xyz" } }] } } }] } }
Single table queries
Query1 (This query takes about 8 seconds.)
<!-- language: lang-json --> { "query": { "constant_score": { "filter": { "and": [{ "term": { "CommonCharacteristic_BuildingScheme": "BuildingScheme1" } }, { "term": { "Address_Admin2Name": "Admin2Name1" } }] } } } } **Query2** (This query takes around 6.5 seconds for Top 10 records ( but has sort on top of it) <!-- language: lang-json --> { "query": { "constant_score": { "filter": { "and": [{ "term": { "Insurer": "Insurer1" } }, { "term": { "Status": "Status1" } }] } } } }
Can someone please help me on how I can improve the performance of this request?