Suoop removal file which is zero

I am looking for a command in hadoop 2.x to delete files that are null bytes in hdf. Can someone please let me know the appropriate command. I am trying to find files with zero bytes in hdfs and remove them from the directory.

+4
source share
2 answers
for f in $(hdfs dfs -ls -R / | awk '$1 !~ /^d/ && $5 == "0" { print $8 }'); do hdfs dfs -rm "$f"; done

Step by step:

hdfs dfs -ls -R / - list all files in HDFS recursively

awk '$1 !~ /^d/ && $5 == "0" { print $8 }') - print the full path of those that are not directories and size 0

for f in $(...); do hdfs dfs -rm "$f"; done - iteratively delete

+4
source

Based on Kombine's answer, if you have many files to delete it, it will be easier to use xargs. This will allow you to delete multiple files per command hdfs, which is quite expensive.

hdfs dfs -ls -R / | awk '$1 !~ /^d/ && $5 == "0" { print $8 }' | xargs -n100 hdfs dfs -rm
+2
source

Source: https://habr.com/ru/post/1661495/


All Articles