How to count the number of files in a specific directory in hadoop?

I am new to map-reduce framework. I want to know the number of files in a specific directory by specifying the name of this directory. for example, suppose we have 3 directories A, B, C, and each of them has 20, 30, 40 parts-r files, respectively. So I'm interested in writing a hadoop job that will count the files / records in each directory. I want to get the result in .txt format in .txt format:

A has 20 entries

B has 30 entries

C has 40 entries

All of these directories are present in HDFS.

+4
source share
1 answer

/ - hdfs, -count:

hdfs dfs -count /path/to/your/dir  >> output.txt

, Linux:

hadoop fs -ls /path/to/your/dir/*  | wc -l >> output.txt

, MapReduce :

HDFS MR?

:

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}
System.out.println("The count is: " + count);
+1

Source: https://habr.com/ru/post/1650257/


All Articles