Elasticsearch could not recover from an accident

Out of disk space, and this tightened the elasticsearch string. Three nodes are now red, two of them are restored, and their condition is yellow. ES runs on the processor 150% and high in memory, trying to recover them. But there seems to be a conflict with the version.

I cleaned up the disk space and removed the translation for the shard to stop downloading from translog. But itโ€™s amazing that the translator is being created again!

Please share how I can stop this recovery attempt from the transliterator and resume normal index operations. I do not want to delete the fragment data.

[2014-10-31 03:11:43,742][WARN ][cluster.action.shard ] [Angela Cairn] [western_europe][4] sending failed shard for [western_europe][4], node[x5M73qVXS5eZIBdz40boEg], [P], s[INITIALIZING], indexUUID [wy-tIJqdQiynz5SGQ2IrGA], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[western_europe][4] failed to recover shard]; nested: ElasticsearchException[failed to read [tweet][527924645014818817]]; nested: ElasticsearchIllegalArgumentException[No version type match [101]]; ]] [2014-10-31 03:11:43,742][WARN ][cluster.action.shard ] [Angela Cairn] [western_europe][4] received shard failed for [western_europe][4], node[x5M73qVXS5eZIBdz40boEg], [P], s[INITIALIZING], indexUUID [wy-tIJqdQiynz5SGQ2IrGA], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[western_europe][4] failed to recover shard]; nested: ElasticsearchException[failed to read [tweet][527924645014818817]]; nested: ElasticsearchIllegalArgumentException[No version type match [101]]; ]] [2014-10-31 03:11:43,859][WARN ][indices.cluster ] [Angela Cairn] [western_europe][2] failed to start shard org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [western_europe][2] failed to recover shard at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:269) at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.ElasticsearchException: failed to read [tweet][527936245440065536] at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:511) at org.elasticsearch.index.translog.TranslogStreams.readTranslogOperation(TranslogStreams.java:52) at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:241) ... 4 more Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: No version type match [116] at org.elasticsearch.index.VersionType.fromValue(VersionType.java:307) at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:508) 
+5
source share
1 answer

First, check to see if there really are any problems with the fragments themselves. cd to the yout /usr/share/elasticsearch/lib directory or its equivalent, and use Lucene CheckIndex as follows:

 java -cp "*" -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /var/lib/elasticsearch/<ES-NAME>/nodes/<NODE-NUMBER>/indices/<INDEX-NAME>/<SHARD-NUMBER/index/ 

This will check the shard for problems and take some time if your shards are large.

Remember that if you make a mistake in the Java classpath, some necessary jar files will be missing, and CheckIndex may mistakenly declare that all segments in the fragment are broken, so read the output carefully.

If there are problems with a shard, and you have no other way to restore it, running the same command with the -fix argument will correct the shard , but you will lose data . CheckIndex will warn you how many documents (if any) you will lose from the shard.

If CheckIndex reports that everything is fine with the shard, then I hope your problem is only with the broadcast. A transaction log is a logbook that ElasticSearch uses for atomicity. After the crash, ES will try to recover the fragment, including records that have not yet been reset to the fragment index itself. They are in the translator, so you will lose them if you delete it . This, however, is much better than losing a shard. In your case, the translation already seems damaged, and I do not know how to restore it.

To remove the damaged transaction log used for recovery, simply delete the translation by deleting the translation files in /var/lib/elasticsearch/<ES-NAME>/nodes/<NODE-NUMBER>/indices/<INDEX-NAME>/<SHARD-NUMBER>/translog/ for each corresponding fragment for each affected node . The last part is important because you can see the cluster trying to regenerate the translation fragment from another node after removing it from one.

The fragments must be correctly initialized, although usually, as usual, it may take some time.

+3
source

Source: https://habr.com/ru/post/1205914/


All Articles