Elasticsearch indexing using BulkRequestBuilder slows down

Hello to all elasticsearch masters.

I have millions of data that need to be indexed using the elasticsearch Java API. The number of cluster nodes for elastics search is three (1 as master + 2 nodes).

The following is a snippet of code.

Settings settings = ImmutableSettings.settingsBuilder()
     .put("cluster.name", "MyClusterName").build();

TransportClient client = new TransportClient(settings);
String hostname = "myhost ip";
int port = 9300; 
client.addTransportAddress(new InetSocketTransportAddress(hostname, port));

BulkRequestBuilder bulkBuilder = client.prepareBulk();
BufferedReader br = new BufferedReader(new InputStreamReader(new DataInputStream(new FileInputStream("my_file_path"))));
long bulkBuilderLength = 0;
String readLine = "";
String index = "my_index_name";
String type = "my_type_name";
String id = "";

while((readLine = br.readLine()) != null){

    id = somefunction(readLine);
    String json = new ObjectMapper().writeValueAsString(readLine);
    bulkBuilder.add(client.prepareIndex(index, type, id)
        .setSource(json));
    bulkBuilderLength++;
    if(bulkBuilderLength % 1000== 0){
        logger.info("##### " + bulkBuilderLength + " data indexed.");
        BulkResponse bulkRes = bulkBuilder.execute().actionGet();
        if(bulkRes.hasFailures()){
            logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
        }
    }
}

br.close();

if(bulkBuilder.numberOfActions() > 0){
    logger.info("##### " + bulkBuilderLength + " data indexed.");
    BulkResponse bulkRes = bulkBuilder.execute().actionGet();
    if(bulkRes.hasFailures()){
        logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
    }
    bulkBuilder = client.prepareBulk();
}

It works great, but performance gets slower than thousands of documents .

I have tried to change the value refresh_interval both -1 and number_of_replicas "as a 0 . However, the situation is the same performance degradation.

bigdesk, GC 1 , .

- ?

.

enter image description here

=================== ========================= ===

, . (. ).

, BulkRequestBuilder. , , .

.

Settings settings = ImmutableSettings.settingsBuilder()
     .put("cluster.name", "MyClusterName").build();

TransportClient client = new TransportClient(settings);
String hostname = "myhost ip";
int port = 9300; 
client.addTransportAddress(new InetSocketTransportAddress(hostname, port));

BulkRequestBuilder bulkBuilder = client.prepareBulk();
BufferedReader br = new BufferedReader(new InputStreamReader(new DataInputStream(new FileInputStream("my_file_path"))));
long bulkBuilderLength = 0;
String readLine = "";
String index = "my_index_name";
String type = "my_type_name";
String id = "";

while((readLine = br.readLine()) != null){

    id = somefunction(readLine);
    String json = new ObjectMapper().writeValueAsString(readLine);
    bulkBuilder.add(client.prepareIndex(index, type, id)
        .setSource(json));
    bulkBuilderLength++;
    if(bulkBuilderLength % 1000== 0){
        logger.info("##### " + bulkBuilderLength + " data indexed.");
        BulkResponse bulkRes = bulkBuilder.execute().actionGet();
        if(bulkRes.hasFailures()){
            logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
        }
        bulkBuilder = client.prepareBulk();  // This line is my mistake and the solution !!!
    }
}

br.close();

if(bulkBuilder.numberOfActions() > 0){
    logger.info("##### " + bulkBuilderLength + " data indexed.");
    BulkResponse bulkRes = bulkBuilder.execute().actionGet();
    if(bulkRes.hasFailures()){
        logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
    }
    bulkBuilder = client.prepareBulk();
}
+4
1

, Bulk.

, .

, BulkProcessor. .

+8

Source: https://habr.com/ru/post/1535659/


All Articles