HBase WAL file and HDFS data recording

As I understand what WAL is: Write Ahead Log provides sequential put / delete operations. Before making changes in the region, all operations are written. If something goes wrong with the region server, we can restore the information from the WAL.

I do not understand how WAL is implemented on top of HDFS?

From the HDFS documentation:

A client file creation request does not reach the Node name immediately. In fact, the HDFS client initially caches the file data to a temporary local file. Application recording is transparently redirected to this temporary local file. When a local file accumulates data on one HDFS block size, the NameNode client contacts. NameNode inserts the file name into the hierarchy file system and allocates a data block for it. NameNode responds to a client request with the DataNode identifier and the target data block. The client then flushes the data block from the local temporary file to the specified DataNode. When the file is closed, the remaining undisclosed data in the temporary local file is transferred to the DataNode

So, is it possible to lose the contents of the WAL if I made a small change and its contents have not yet been sent to hdf?

EDIT : As I understand it from: http://hadoop-hbase.blogspot.com.by/2012/05/hbase-hdfs-and-durable-sync.html

we could force the hdfs client to synchronize the data without expecting it to become equal to the block size.

+4
source share
2 answers

The data you write to Hbase goes through the following steps Put → WAL → memstore → HFile. HFile is the actual file that is stored in HDFS. The node name and node data are called here. And sorting HFile.

The sort operation is performed in Memstore, during which, after reaching a certain buffer size, it is flushed to HFile.

Now, to avoid data loss present in memstore, WALs are used.

0

Source: https://habr.com/ru/post/1608774/


All Articles