I just started indexing pdf / word documents and some of them can get quite large. To make matters worse, I use n-grams in my index analyzer. I have 8 GB of RAM dedicated to ElasticSearch, but our index is currently around 45 GB (only about 6 GB without documents), which leads me to my problem ...
After we added the documents to our index, reindexing all started much longer, which was to be expected, but it also accidentally started a failure with a general timeout error. I traced the problem with our http client (HTTParty ruby gem), which had a default timeout of 10. I increased it to 480s and reindex gets more documents, but still in the end.
My question is:
- Should Elasticsearch take so long to respond to an index request?
- What can I do to fix this problem?
source share