How to optimize elasticsearch percolator index Memory performance

Question

How to optimize elasticsearch percolator index Memory performance

Is there a way to improve memory performance when using elasticsearch percolation index?

I created a separate index for my percolator. I have about 1,000,000 user saved searches (for email alerts). After creating this percolator index, my heap usage grew to 100% and the server did not respond to any requests. I have a few limited resources, and I can’t just throw more RAM at the problem. The only solution is to remove the index containing my saved searches.

From what I read, the percolator index is constantly in memory. Is this absolutely necessary? Is there a way to throttle this behavior, but still keep the functionality? Is there a way to optimize my data / queries / index structure in order to circumvent this behavior while maintaining the desired result?

+6

elasticsearch

richardpj Feb 03 '15 at 7:43

source share

1 answer

richardpj · Accepted Answer · 2015-04-24T08:06:25+0000

There is no permission to this question from the point of view of ElasticSearch and is not probable. I talked to the ElasticSearch guys directly, and their answer: “throw more hardware on it”.

However, I found a way to solve this problem in terms of mitigating my use of this feature. When I analyzed the saved search data, I found that my search queries consisted of about 100,000 unique keyword searches and various filter permutations creating more than 1,000,000 saved searches.

If I look at the filters, these are things like:

Location - 300+
Industry - 50+
etc...

Providing a solution space:

100,000 *> 300 *> 50 * ... ~ => 1,500,000,000

However, if I have to decompose the search and index keywords and filters separately in the percolator index, I get much fewer requests:

100,000 +> 300 +> 50 + ... ~ => 100 350

And these searches themselves are less and less complicated than the original searches.

Now I am creating a second (without percocator) index listing all 1,000,000 saved searches and including the identifiers of the search components from the percolator index.

Then I percuss the document, and then I execute the second query, which filters the search results by keywords and percolator filtering results. I can even keep the relevance score as it returns solely from a keyword search.

This approach will significantly reduce the memory size of the percolar index when using the same purpose.

I would like to offer feedback on this approach (I have not tried it yet, but I will keep you posted).

Similarly, if my approach is successful, do you think this is worth the function request?

How to optimize elasticsearch percolator index Memory performance

More articles: