I have a startup cluster (HDP-2.3.0.0-2557), it consists of 10 physical servers (2 management servers and 8 data nodes, all of which are healthy). The cluster (HDFS) was loaded with an initial dataset of approximately 4 TB of data a month ago. Most importantly, after the download, there were no messages about any missing or damaged blocks!
I loaded the Ambari toolbar a month later, when I did not use the system at all, and in the section HDFS summary - Block Error. I see "28 missing / 28 under replicated". The servers were not used at all, in particular, there are no cards that shorten tasks, and new files are not read or written to / from HDFS. How is it possible that 28 blocks are now reported as damaged?
The original data source, which is located on the same 4Tb drive, does not have missing blocks, corrupted files or anything like that, and it works just fine! Having data in triplicate using HDFS should definitely protect me from lost / corrupted files.
I ran all the suggested fsck commands and see lines like:
/user/ambari-qa/examples/input-data/rawLogs/2010/01/01/01/40/log05.txt: MISSING 1 blocks of total size 15 B...........
/user/ambari-qa/examples/src/org/apache/oozie/example/DemoMapper.java: CORRUPT blockpool BP-277908767-10.13.70.142-1443449015470 block blk_1073742397
I convinced my Hadoop manager that he was advancing due to impressive claims to sustainability, but does this example prove (at least to me) that HDFS is covered? Maybe I'm doing something wrong, but, of course, I will not need to look for a file system for missing blocks. I need to go back to my manager with an explanation if one of the 28 missing files was critical, then HDFS would land me in hot water! At the moment, my manager believes that HDFS is not suitable for the goal!
I need to miss something or do something wrong, for sure the files / blocks stored in triplicate are 3 times less likely to disappear ?! The concept is that if one of the node data is disabled, then the file is marked as replicated and eventually copied to the other node data.
: HDP . 4Tb , HDFS ( ). 1 . HDFS 28 ( 9 ).
- ?
hdfs fsck/:
Total size: 462105508821 B (Total open files size: 1143 B)
Total dirs: 4389
Total files: 39951
Total symlinks: 0 (Files currently being written: 13)
Total blocks (validated): 41889 (avg. block size 11031667 B) (Total open file blocks (not validated): 12)
********************************
UNDER MIN REPL'D BLOCKS: 40 (0.09549046 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 40
MISSING BLOCKS: 40
MISSING SIZE: 156470223 B
CORRUPT BLOCKS: 28
********************************
Minimally replicated blocks: 41861 (99.93316 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.998138
Corrupt blocks: 28
Missing replicas: 0 (0.0 %)
Number of data-nodes: 8
Number of racks: 1
FSCK ended at Thu Dec 24 03:18:32 CST 2015 in 979 milliseconds
The filesystem under path '/' is CORRUPT
!