Some background of the elasticsearch instance:
- One node, on one machine
- The specific index consists of 2.6 billion documents of 1.23 TB in size.
- The index is divided into 4 fragments.
- Heap size set to 30 GB
- The server has 256 GB of RAM and 40 cores.
- Elasticsearch (version 1.4.3) is the only thing that works on this server.
I want to return all documents with a specific name. The attribute name is displayed:
"name": {
"type": "string",
"index": "not_analyzed"
}
I tried using a different type of search; filter, query_string, term. All with the same result. The current request looks like this:
{ "query": {
"query_string": {
"default_field" : "name",
"query": "test_run_435_tc"
}
},
"size" : 10000000
}
The problem is that the request does not return the correct number of documents on the first try. I know that the index contains about 45,000 documents with the name "test_run_435_tc".
, 5000 .
, . 3-4 .
elasticsearch-py .
, elasticsearch , .
elasticsearch ? elasticsearch - ? , .
, :
"": 10000000 , , .
"size": 0 :
{u'_shards': {u'failed': 0, u'successful': 4, u'total': 4},
u'hits': {u'hits': [], u'max_score': 0.0, u'total': 28754},
u'timed_out': True,
u'took': 130}
"": 0, :
{u'_shards': {u'failed': 0, u'successful': 4, u'total': 4},
u'hits': {u'hits': [], u'max_score': 0.0, u'total': 39223},
u'timed_out': True,
u'took': 134}
, , "size": 0, .....? timeout = 100000 & search_type = count :
{
"took": 525,
"timed_out": false,
"_shards": {
"total": 4,
"successful": 4,
"failed": 0
},
"hits": {
"total": 49501,
"max_score": 0,
"hits": []
}
}
, 49501 "hits_total", !