How to count the number of files in HDFS from an MR job?

I'm new to Hadoop and Java, for that matter. I am trying to count the number of files in an HDFS folder from the MapReduce driver that I am writing. I would like to do this without invoking the HDFS Shell, since I want to be able to transfer to the directory that I use when I start MapReduce. I tried a number of methods, but was unsuccessful in implementation due to my inexperience with Java.

Any help would be greatly appreciated.

Thank,

Nomad.

+1
source share
1 answer

You can simply use FileSystem and iterate over files inside the path. Here is a sample code

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}
+2
source

Source: https://habr.com/ru/post/1650259/


All Articles