Get the latest updated file in HDFS

I need the latest updated file from one of my HDFS directories. The code should basically scroll through directories and auxiliary directories and get the last path to the file with the file name. I was able to get the latest file on the local file system, but not sure how to do it for HDFS.

find /tmp/sdsa -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head

The above code works for the local file system. I can get the date, time and file name from HDFS, but how to get the last file using these 3 options?

this is the code i tried:

hadoop fs -ls -R /tmp/apps | awk -F" " '{print $6" "$7" "$8}'

Any help would be appreciated.

Thanks in advance.

+5
source share
2 answers

This worked for me:

hadoop fs -ls -R /tmp/app | awk -F" " '{print $6" "$7" "$8}' | sort -nr | head -1 | cut -d" " -f3

Output is the entire path to the file.

+7
source

Here is the command:

 hadoop fs -ls -R /user| awk -F" " '{print $6" "$7" "$8}'|sort -nr|head|cut -d" " -f3- 

Your script itself is good enough. Hadoop returns dates in the format YYYY-MM-DD HH24: MI: SS and therefore you can sort them alphabetically.

+2
source

Source: https://habr.com/ru/post/1240212/


All Articles