HBase - What is the difference between WAL and MemStore?

I am trying to understand the HBase architecture. I see that two different terms are used for this purpose.

Write Ahead Logs and Memstore are both used to store new data that has not yet been saved until permanent storage .

What is the difference between WAL and MemStore?

Update:

WAL - used to restore data that is not yet saved in the event of a server failure. MemStore - saves updates in memory as a Sorted key value.

There seems to be a lot of data duplication before writing data to disk.

+5
source share
1 answer

WAL designed to restore NOT to duplicate data.

Pls go below to understand more ...

  • Habse stores MemStore and 0 or more StoreFiles (HFiles). The store corresponds to the column family for the table for this area.

  • The Write Ahead (WAL) journal writes all changes to data in HBase, a file-based storage. if a RegionServer crashes or becomes unavailable until the MemStore is reset, WAL ensures that changes to the data can be reproduced.

  • When using one WAL for a RegionServer, a RegionServer must be written to the WAL sequentially, since the HDFS files must be sequential. This makes WAL a performance bottleneck.

  • WAL can be disabled to improve bottleneck performance. This is done by calling the Hbase client field.

Mutation.writeToWAL(false)

General note . A common practice is that when mass loading is performed, the WAL is disabled to obtain speed. But a side effect is that if you turn off WAL, you cannot return data for playback if in the event of a memory failure.

Moreover, if you use solr + HBASE + LILY, that is, LILY Morphiline NRT indexes with hbase, then it will work on WAL, if you disable WAL for performance reasons, then Solr NRT indexing will not work. since Lily works on WAL.

please browse the hbase architecture section

enter image description here enter image description here

+3
source

Source: https://habr.com/ru/post/1258282/


All Articles