As I understand what WAL is: Write Ahead Log provides sequential put / delete operations. Before making changes in the region, all operations are written. If something goes wrong with the region server, we can restore the information from the WAL.
I do not understand how WAL is implemented on top of HDFS?
From the HDFS documentation:
A client file creation request does not reach the Node name immediately. In fact, the HDFS client initially caches the file data to a temporary local file. Application recording is transparently redirected to this temporary local file. When a local file accumulates data on one HDFS block size, the NameNode client contacts. NameNode inserts the file name into the hierarchy file system and allocates a data block for it. NameNode responds to a client request with the DataNode identifier and the target data block. The client then flushes the data block from the local temporary file to the specified DataNode. When the file is closed, the remaining undisclosed data in the temporary local file is transferred to the DataNode
So, is it possible to lose the contents of the WAL if I made a small change and its contents have not yet been sent to hdf?
EDIT : As I understand it from:
http://hadoop-hbase.blogspot.com.by/2012/05/hbase-hdfs-and-durable-sync.html
we could force the hdfs client to synchronize the data without expecting it to become equal to the block size.
source
share