We found several duplicate documents in one of our Elasticsearch indexes, and we could not solve the reason. There are two copies of each of the affected documents, and they have exactly the same _id , _type and _uid .
A GET request to /index-name/document-type/document-id just returns one copy, but searching for a document with such a request returns two results, which is pretty surprising:
POST /index-name/document-type/_search { "filter": { "term": { "_id": "document-id" } } }
Aggregation in the _uid field also identifies duplicate documents:
POST /index-name/_search { "size": 0, "aggs": { "duplicates": { "terms": { "field": "_uid", "min_doc_count": 2 } } } }
Duplicates are on different fragments. For example, a document can have one copy of the primary fragment 0 and one copy on the main fragment 1. We checked this by performing the aggregate query above on each fragment, in turn, using the preference parameter : it does not find duplicates within the same fragment.
Our best guess is that something went wrong with routing, but we donβt understand how these copies could be redirected to different fragments. According to the routing documentation , the default routing is based on the document identifier and must sequentially route the document to the same shard.
We do not use custom routing options that would override the default routing. We double-checked this, making sure that duplicate documents do not have a _routing field.
We also do not define the relationship between parents and children, which also affect routing. (See this question on the Elasticsearch forum , for example, which has the same symptoms as our problem. We donβt think the reason is the same because we donβt install the parents of the documents).
We fixed the immediate problem by reindexing into a new index that crushed duplicate documents. We still have the old index for debugging.
We did not find a way to replicate the problem. The new index correctly indexes the documents, and we tried to reprogram the night processing job, which also updates the documents, but it did not create more duplicates.
The cluster has 3 nodes, 3 primary fragments, and 1 replica (i.e. 3 replicas). minimum_master_nodes set to 2, which should prevent the split-brain problem. We are launching Elasticsearch 2.4 (which, as we know, is old - we plan to update soon).
Does anyone know what might cause these duplicates? Do you have any suggestions for debugging it?