What is the fastest indexing method for ElasticSearch

We have been working with ElasticSearch 2.x for quite some time. Everything fully meets our requirements, except for one weak point: the performance of writing / indexing to the ElasticSearch cluster is not very good.

In our case, we have 8 nodes of the ES cluster, these are 100 field fields that we enter in the ES. Indexing is around 50,000 per minute, which is too slow for our scenario. We have tried all the setup methods recommended by www.elastic.co. The fastest way we found is to build the json payload as files, they upload them to ES using the bulk API. Nevertheless, the indexing speed is too slow.

I saw several ES-Hadoop connectors, also elasticsearch has spark support where you can use saveToES () to save RDD to ES. I suspect they all use the ES API. Can anyone share experience with them? What is the fastest way to write indexes in ElasticSearch?

+6
source share
1 answer

Regardless of which third-party tool you use outside of ES, everything should use ES data entry methods. Either Spark, Logstash, your own application should still use an array or api index one way or another. There is no backdoor magic.

+5
source

Source: https://habr.com/ru/post/1016297/


All Articles