In the HDFS environment, we use commodity machines for data storage, these commodity machines are not high-performance machines, such as servers with high RAM, there will be a chance of losing data nodes (d1, d2, d3) or block (b1, b2, b3), as a result the HDFS structure splits each block of data (64 MB, 128 MB) into three replications (by default), and each block will be stored in separate data nodes (d1, d2, d3). Now consider that block (b1) is damaged in data-node (d1), and a copy of block (b1) is available in data-node (d2) and data-node (d3), so that the client can request data node (d2) for process the data block (b1) and provide the result just as if the data
I hope you have some clarity.
source share