I get wildly different reports about an underreplicated block. I wonder what exactly. hadoop dfsadmin -metasave reports ~ 232,000 MISSING blocks awaiting replication. How to fix it? Tasks work very well and there seems to be no data.
Please see the output of hadoop fsck / , hadoop dfsadmin -report , hadoop dfsadmin -metasave and the hadoop dfsadmin -metasave web GUI below:
hadoop fsck / :
Total size: 6066860793495 B (Total open files size: 47000701003 B) Total dirs: 1801 Total files: 230828 (Files currently being written: 493) Total blocks (validated): 242592 (avg. block size 25008494 B) (Total open file blocks (not validated): 681) Minimally replicated blocks: 242592 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 932 (0.38418415 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.9945753 Corrupt blocks: 0 Missing replicas: 1851 (0.25479725 %) Number of data-nodes: 20 Number of racks: 1 FSCK ended at Thu Nov 03 10:17:47 CDT 2011 in 7359 milliseconds
hadoop dfsadmin -report :
Configured Capacity: 59070545264640 (53.72 TB) Present Capacity: 56867905841329 (51.72 TB) DFS Remaining: 37637696475136 (34.23 TB) DFS Used: 19230209366193 (17.49 TB) DFS Used%: 33.82% Under replicated blocks: 245346 Blocks with corrupt replicas: 73 Missing blocks: 0
metasave output ... hadoop dfsadmin - excerpt from the array:
232461 files and directories, 243290 blocks = 475751 total Live Datanodes: 20 Dead Datanodes: 0 Metasave: Blocks waiting for replication: 242747
There are about 1,000 real files that are being replicated (or waiting), and then ~ 232,000 "MISSING" files all look like:
: blk_2551072940280567829_12480437 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_2565249812869117144_12480431 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_2950011510944289339_12480413 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3809337797233614456_12456357 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3809337797233614456_12463021 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3809337797233614456_12468869 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3809337797233614456_12474511 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3811560762593023914_12440928 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3811560762593023914_12449396 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3811560762593023914_12462184 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3811560762593023914_12465792 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3811560762593023914_12472905 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3812070171484751861_12436051 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3815454413870879906_12441243 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) Metasave: Blocks being replicated: 0 Metasave: Blocks 29 waiting deletion from 17 datanodes.
Namenode User Interface:
Cluster Summary 232390 files and directories, 243235 blocks = 475625 total. Heap Size is 1.84 GB / 8.68 GB (21%) Configured Capacity : 53.72 TB DFS Used : 17.46 TB Non DFS Used : 2 TB DFS Remaining : 34.26 TB DFS Used% : 32.51 % DFS Remaining% : 63.77 % Live Nodes : 20 Dead Nodes : 0 Decommissioning Nodes : 0 Number of Under-Replicated Blocks : 242532
!! Update: !!
I believe that this should be a mistake, since the number of "underreplicated" blocks is approaching a million. We do not have many actual blocks in the cluster, so this should be a mistake.
The web interface now displays the following:
Cluster Summary 234877 files and directories, 250074 blocks = 484951 total. Heap Size is 706.5 MB/8.68 GB (7%) Configured Capacity : 53.72 TB DFS Used : 20.71 TB Non DFS Used : 1.54 TB DFS Remaining : 31.47 TB DFS Used% : 38.56 % DFS Remaining% : 58.58 % Live Nodes : 20 Dead Nodes : 0 Decommissioning Nodes : 0 Number of Under-Replicated Blocks : 451014