SearchContextMissingException Failed to complete the fetch phase [search / phase / fetch / id]


Cluser: I am using elasticsearch 1.3.1 with 6 nodes on different servers, all of which are connected by LAN. The bandwidth is high, and each of them has 45 GB of RAM.

Configuration The heap size that we have allocated for the node to run is 10g. We have a default elasticsearch configuration, in addition to the unique opening, cluster name, node and zone 2. 3 node belongs to one zone and the other belongs to another zone.

indexes : 15; total index size is 76 GB.

Now are the days before which I encountered a SearchContextMissingException exception, like the DEBUG log. It smells like some kind of search query took a long time to get. but I checked with the requests, there was no request to get a large amount of load on the cluster ... I wonder why this is happening.

Problem: Due to this problem, one by one, all nodes begin to build the GC. and lead to OOM :(

Here is my exception. Please kindly explain 2 things to me.

  • What is a SearchContextMissingException ? Why is this happening?
  • How can we prevent a cluster from this type of request?

Error:

 [YYYY-MM-DD HH:mm:ss,039][DEBUG][action.search.type ] [es_node_01] [5031530] Failed to execute fetch phase org.elasticsearch.transport.RemoteTransportException: [es_node_02][inet[/1x.x.xx.xx:9300]][search/phase/fetch/id] Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [5031530] at org.elasticsearch.search.SearchService.findContext(SearchService.java:480) at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:450) at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:793) at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:782) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 
+5
source share
2 answers

If you can, upgrade to 1.4.2. It fixes some known failover issues, including cascading crashes as you describe.

Regardless, the default configuration will certainly cause you problems. At a minimum, you may have to look at setting circuit breakers, for example. field data caches.

Here is a snippet taken from our production configuration. I assume that you also correctly configured linux file limits: see here

 # prevent swapping bootstrap.mlockall: true indices.breaker.total.limit: 70% indices.fielddata.cache.size: 70% # make elasticsearch work harder to migrate/allocate indices on startup (we have a lot of shards due to logstash); default was 2 cluster.routing.allocation.node_concurrent_recoveries: 8 # enable cors http.cors.enabled: true http.cors.allow-origin: /https?:\/\/(localhost|kibana.*\.linko\.io)(:[0-9]+)?/ index.query.bool.max_clause_count: 4096 
0
source

The same error (or debugging instruction) still occurs in version 1.6.0 and is not an error.

When you create a new scroll request:

 SearchResponse scrollResponse = client.prepareSearch(index).setTypes(types).setSearchType(SearchType.SCAN) .setScroll(new TimeValue(60000)).setSize(maxItemsPerScrollRequest).setQuery(ElasticSearchQueryBuilder.createMatchAllQuery()).execute().actionGet(); String scrollId = scrollResponse.getScrollId(); 

a new scroll identifier is created (except for scrolling if the answer is empty). To get the results:

 long resultCounter = 0l; // to keep track of the number of results retrieved Long nResultsTotal = null; // total number of items we will be expecting do { final SearchResponse response = client.prepareSearchScroll(scrollId).setScroll(new TimeValue(600000)).execute().actionGet(); // handle result if(nResultsTotal==null) // if not initialized nResultsTotal = response.getHits().getTotalHits(); //set total number of Documents resultCounter += response.getHits().getHits().length; //keep track of the items retrieved } while (resultCounter < nResultsTotal); 

This approach works regardless of how many fragments you have. Another option is to add a break statement when:

 boolean breakIf = response.getHits().getHits().length < (nShards * maxItemsPerScrollRequest); 

Number of items returned: maxItemsPerScrollRequest (for each skull!), So we expect the number of items requested will be multiplied by the number of fragments. But when we have multiple fragments, and one of them comes out of the documents, while the others do not, then the first method will still provide us with all the available documents. The latter will stop prematurely - I expect (did not try!)

Another way to stop seeing this exception (since it is "only" DEBUG ) is to open the logging.yml file in the config directory of ElasticSearch, and then change:

 action: DEBUG 

to

 action: INFO 
0
source

Source: https://habr.com/ru/post/1209546/


All Articles