Hello to all elasticsearch masters.
I have millions of data that need to be indexed using the elasticsearch Java API. The number of cluster nodes for elastics search is three (1 as master + 2 nodes).
The following is a snippet of code.
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "MyClusterName").build();
TransportClient client = new TransportClient(settings);
String hostname = "myhost ip";
int port = 9300;
client.addTransportAddress(new InetSocketTransportAddress(hostname, port));
BulkRequestBuilder bulkBuilder = client.prepareBulk();
BufferedReader br = new BufferedReader(new InputStreamReader(new DataInputStream(new FileInputStream("my_file_path"))));
long bulkBuilderLength = 0;
String readLine = "";
String index = "my_index_name";
String type = "my_type_name";
String id = "";
while((readLine = br.readLine()) != null){
id = somefunction(readLine);
String json = new ObjectMapper().writeValueAsString(readLine);
bulkBuilder.add(client.prepareIndex(index, type, id)
.setSource(json));
bulkBuilderLength++;
if(bulkBuilderLength % 1000== 0){
logger.info("##### " + bulkBuilderLength + " data indexed.");
BulkResponse bulkRes = bulkBuilder.execute().actionGet();
if(bulkRes.hasFailures()){
logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
}
}
}
br.close();
if(bulkBuilder.numberOfActions() > 0){
logger.info("##### " + bulkBuilderLength + " data indexed.");
BulkResponse bulkRes = bulkBuilder.execute().actionGet();
if(bulkRes.hasFailures()){
logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
}
bulkBuilder = client.prepareBulk();
}
It works great, but performance gets slower than thousands of documents .
I have tried to change the value refresh_interval both -1 and number_of_replicas "as a 0 .
However, the situation is the same performance degradation.
bigdesk, GC 1 , .
- ?
.

=================== ========================= ===
, . (. ).
, BulkRequestBuilder.
, , .
.
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "MyClusterName").build();
TransportClient client = new TransportClient(settings);
String hostname = "myhost ip";
int port = 9300;
client.addTransportAddress(new InetSocketTransportAddress(hostname, port));
BulkRequestBuilder bulkBuilder = client.prepareBulk();
BufferedReader br = new BufferedReader(new InputStreamReader(new DataInputStream(new FileInputStream("my_file_path"))));
long bulkBuilderLength = 0;
String readLine = "";
String index = "my_index_name";
String type = "my_type_name";
String id = "";
while((readLine = br.readLine()) != null){
id = somefunction(readLine);
String json = new ObjectMapper().writeValueAsString(readLine);
bulkBuilder.add(client.prepareIndex(index, type, id)
.setSource(json));
bulkBuilderLength++;
if(bulkBuilderLength % 1000== 0){
logger.info("##### " + bulkBuilderLength + " data indexed.");
BulkResponse bulkRes = bulkBuilder.execute().actionGet();
if(bulkRes.hasFailures()){
logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
}
bulkBuilder = client.prepareBulk();
}
}
br.close();
if(bulkBuilder.numberOfActions() > 0){
logger.info("##### " + bulkBuilderLength + " data indexed.");
BulkResponse bulkRes = bulkBuilder.execute().actionGet();
if(bulkRes.hasFailures()){
logger.error("##### Bulk Request failure with error: " + bulkRes.buildFailureMessage());
}
bulkBuilder = client.prepareBulk();
}