Upgrade to Flash

Question

Upgrade to Flash

If the new index is indexed by the Elasticsearch index, it is searchable approximately 1 second after the index operation. However, it can be forced to make this document searchable immediately by invoking the _flush or _refresh on the index. What is the difference between these two operations - the result is the same for them, the document is immediately available for search.

What exactly does each of these operations do?

ES documentation does not seem to solve this problem deeply.

+68

elasticsearch

scdmb Nov 13 '13 at 20:13

source share

2 answers

The update allows you to record a new segment so that it becomes available for search.

A flash causes Lucene to commit. It is much more expensive.

For more information, I wrote an article that covers some of the following: Elasticsearch from bottom to top :)

+24

Alex Brasetvik Nov 13 '13 at 22:04

source share

javanna · Accepted Answer · 2013-11-14 09:18

The answer you received is correct, but I think it’s worthwhile to dwell in more detail.

The update actually causes the lucene index reader to reopen, so that the snapshot of the data you can search on is updated. This feature of lucene is part of lucene near the real-time API.

Elasticsearch update makes your documents searchable, but do not make sure that they are written to disk in persistent storage, as it does not invoke FSYNC, and thus does not guarantee durability. What makes your data durable is lucene fixation, which is much more expensive.

Although you can call lucene reopen every second, you cannot do the same with lucene commit.

Through lucene, you can retrieve new documents that are searchable in near real time, often by calling reopen, but you still need to call commit to ensure that the data is written to disk and synchronized, which ensures security.

Elasticsearch solves this “problem” by adding a transaction log for each segment (in fact, the lucene index) where write operations that have not yet been committed are stored. The transaction log is in a safe and secure state, so you get durability at any given time, even for documents that have not yet been committed. You can search for documents in almost real time, as the update takes place automatically every second, and you can also be sure that in case of something bad, the transaction log can be reproduced in order to recover ultimately lost documents. The nice thing about the transaction log is that it can be used for other purposes, for example, to get the identifier in real time.

Resetting the elastic search effectively triggers lucene commit and also clears the transaction log, because after committing data at the lucene level, lucene can guarantee its longevity. Flush is also provided as an API and is subject to change, although this is usually not necessary. The reset occurs automatically depending on how many operations are added to the transaction log, how large they are, and when the last cleanup occurred.

Upgrade to Flash

More articles: