The name Node contains metadata for the entire cluster. It contains information about each folder, file, replication rate, block names, etc. The Node name also stores block location information for each file (this information is built from block reports sent by data nodes) in memory.
Data nodes store the following information for each block:
- Actual data stored in the block
- Metadata for data stored in the block. Mainly contains checksums for the data stored in the block.
They periodically send a heart rate and block reports called Node.
Heart beat :
- The heart rate reporting interval is determined by the
dfs.heartbeat.interval configuration dfs.heartbeat.interval (in hdfs-site.xml). The default value is 3 seconds. - Some information contained in Heart Heart:
- Registration : Node Data Registration Information
- Capacity : The total storage capacity is available in the Data Node.
- dfsUsed : storage used by HDFS
- : remaining storage for HDFS
- blockPoolUsed : storage used by the block pool
- xmitsInProgress : the number of transfers from this Node data to others
- xceiverCount : number of active transceiver streams
- xmitsInProgress : the number of transfers from this Node data to others
- cacheCapacity : total cache capacity available in Data Node
- cacheUsed : number of caches used
- This information is used by the Node name in the following ways:
- Health Data Node . Should this Node data be marked as dead or alive?
- Registration of new Node data . If this is recently added Node data, its information is registered
- Update Node Data Metrics . Heartbeat information used to update Node metrics
- Issue Data Node commands . The Node name can produce the following Data Node data based on information obtained during a heartbeat:
BlockRecoveryCommand (to restore certain blocks), BlockCommand (for transferring blocks to another Data Node, for the invalidity of certain blocks), Cache/Uncache (commands for caching / blocking blocks)
Block reports:
- The block reporting interval is determined by the configuration
dfs.blockreport.intervalMsec (in hdfs-site.xml). The default value is 21600000 milliseconds. - Some information contained in the block report:
- Registration : Node Data Registration Information
- blocks : information about blocks, which contains: block identifier, block length, timestamp of block generation, state of the block replica (for example, the replica is completed or restoration is expected, etc.)
- This information is used by the name Node to:
- Report on the first block of the process . If this is the first report for recently logged Node data, it simply adds all valid replicas. It ignores all invalid blocks until the next block report.
- To update information about blocks : the map (Data Node β Blocks) is updated in the Node name. The new block report is compared with the old report and information on successful blocks, damaged blocks, invalid blocks, etc. is updated.
source share