Under-replicated blocks are inaccurate, but why?

Question

Under-replicated blocks are inaccurate, but why?

I get wildly different reports about an underreplicated block. I wonder what exactly. hadoop dfsadmin -metasave reports ~ 232,000 MISSING blocks awaiting replication. How to fix it? Tasks work very well and there seems to be no data.

Please see the output of hadoop fsck / , hadoop dfsadmin -report , hadoop dfsadmin -metasave and the hadoop dfsadmin -metasave web GUI below:

hadoop fsck / :

  Total size: 6066860793495 B (Total open files size: 47000701003 B) Total dirs: 1801 Total files: 230828 (Files currently being written: 493) Total blocks (validated): 242592 (avg. block size 25008494 B) (Total open file blocks (not validated): 681) Minimally replicated blocks: 242592 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 932 (0.38418415 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.9945753 Corrupt blocks: 0 Missing replicas: 1851 (0.25479725 %) Number of data-nodes: 20 Number of racks: 1 FSCK ended at Thu Nov 03 10:17:47 CDT 2011 in 7359 milliseconds

hadoop dfsadmin -report :

 Configured Capacity: 59070545264640 (53.72 TB) Present Capacity: 56867905841329 (51.72 TB) DFS Remaining: 37637696475136 (34.23 TB) DFS Used: 19230209366193 (17.49 TB) DFS Used%: 33.82% Under replicated blocks: 245346 Blocks with corrupt replicas: 73 Missing blocks: 0

metasave output ... hadoop dfsadmin - excerpt from the array:

 232461 files and directories, 243290 blocks = 475751 total Live Datanodes: 20 Dead Datanodes: 0 Metasave: Blocks waiting for replication: 242747

There are about 1,000 real files that are being replicated (or waiting), and then ~ 232,000 "MISSING" files all look like:

 : blk_2551072940280567829_12480437 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_2565249812869117144_12480431 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_2950011510944289339_12480413 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3809337797233614456_12456357 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3809337797233614456_12463021 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3809337797233614456_12468869 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3809337797233614456_12474511 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3811560762593023914_12440928 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3811560762593023914_12449396 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3811560762593023914_12462184 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3811560762593023914_12465792 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3811560762593023914_12472905 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3812070171484751861_12436051 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) : blk_3815454413870879906_12441243 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) Metasave: Blocks being replicated: 0 Metasave: Blocks 29 waiting deletion from 17 datanodes.

Namenode User Interface:

 Cluster Summary 232390 files and directories, 243235 blocks = 475625 total. Heap Size is 1.84 GB / 8.68 GB (21%) Configured Capacity : 53.72 TB DFS Used : 17.46 TB Non DFS Used : 2 TB DFS Remaining : 34.26 TB DFS Used% : 32.51 % DFS Remaining% : 63.77 % Live Nodes : 20 Dead Nodes : 0 Decommissioning Nodes : 0 Number of Under-Replicated Blocks : 242532

!! Update: !!

I believe that this should be a mistake, since the number of "underreplicated" blocks is approaching a million. We do not have many actual blocks in the cluster, so this should be a mistake.

The web interface now displays the following:

 Cluster Summary 234877 files and directories, 250074 blocks = 484951 total. Heap Size is 706.5 MB/8.68 GB (7%) Configured Capacity : 53.72 TB DFS Used : 20.71 TB Non DFS Used : 1.54 TB DFS Remaining : 31.47 TB DFS Used% : 38.56 % DFS Remaining% : 58.58 % Live Nodes : 20 Dead Nodes : 0 Decommissioning Nodes : 0 Number of Under-Replicated Blocks : 451014

+4

hadoop hdfs

Shawn Nov 03 '11 at 15:33

source share

1 answer

Shawn · Accepted Answer · 2011-11-07T19:14:11+0000

I got a response from Todd Lipcon of Cloudera. I would like to update this question if others have this problem. I noticed this problem with CDH3u1 and this was the answer:

"It is known that the append function is broken in CDH3 and there are probably such errors. We recommend that you recommend that users do not use this. This applies to all releases of Hadoop 0.20.x (CDH and otherwise) and will be committed to CDH4 ( version higher than 0.23 or higher).

Sorry for the bad news. I will review this specific error to make sure it is not in the upper trunk, but it is unlikely to be fixed in the CDH3 release. "

Under-replicated blocks are inaccurate, but why?

More articles: