Command for printing large files, sorting, with dimensions in a humanoid format

Question

Command for printing large files, sorting, with dimensions in a humanoid format

I wrote a simple shell script that finds large files, mainly to save some types of input. Work is performed using:

find $dir -type f -size +"$size"M -printf '%s %p\n' | sort -rn

I would like to turn the byte output into a human readable format. I found ways online on how to manually do this, for example,

 find $dir -type f -size +"$size"M -printf '%s %p\n' | sort -rn | awk '{ hum[1024**4]="TB"; hum[1024**3]="GB"; hum[1024**2]="MB"; hum[1024]="KB"; hum[0]="B"; for (x=1024**4; x>=1024; x/=1024){ if ($1>=x) { printf "%7.2f %s\t%s\n",$1/x,hum[x],$2;break } }}'

But it seems dirty. I was wondering: is there a standard way to convert bytes to a readable form ?

Of course, any alternative methods for obtaining the following output are welcome, given the directory and minimum input size:

  1.25 GB /foo/barf 598.80 MB /foo/bar/bazf 500.58 MB /bar/bazf 421.70 MB /bar/baz/bamf ...

Note. This should work on both 2.4 and 2.6, and the output should be sorted.

+4

linux bash

Christopher neylan Jan 20 '12 at 14:45

source share

2 answers

Use du -h and sort -h

 find /your/dir -type f -size +5M -exec du -h '{}' + | sort -hr

Explanations:

du -h file1 file2 ... prints d isk u sage in h readable XML format for these files.
sort -hr sorts h readable numbers in r order (first numbers).
the + find -exec option will reduce the number of calls to the du command and therefore speed up execution. Here + can be replaced by ';' .

You can remove the -r option of the sort command if you want larger files to be printed at the end. You can even use the simpler following command, but the buffer of the terminal buffer can be filled!

 find /your/dir -type f -exec du -h '{}' + | sort -h

Or, if you want only a dozen large files:

 find /your/dir -type f -exec du -h '{}' + | sort -hr | head

Note: the -h option from sort was introduced around 2009, so this option may not be available in the old distribution (like Red Hat 5). In addition, the + find -exec option is not available on the older distribution (like Red Hat 4).

In the old distribution, you can use xargs instead of the + find -exec option. The ls can also be used to print sorted files. But to guarantee sorting by size, xargs should only call ls once. xargs can call ls only once if the number of files is acceptable: it depends on the length of the text passed to the ls argument (the sum of all the lengths of the file names).

 find /your/dir -type f -size +5M -print0 | xargs -0 ls -1Ssh

(with a little inspiration borrowed from MichaelKrelin-hacker ).

Explanations:

ls -1 displays one file per line
ls -S sorted by file size
ls -S print file size
ls -h prints sizes in human readable format

The fastest command can use the above ls -1Ssh with the + find -exec option, but as mentioned above, the number of files should be acceptable for calling ls only once to ensure sorting by size (the + of find -exec option works in much the same way xargs ).

 find /your/dir -type f -size +5M -exec ls -1Ssh '{}' +

To reduce the number of files found, you can increase the threshold size: replace +5M with +100M , for example.

+9

oliber Jan 20 '12 at 14:57

source share

Michael Krelin - hacker · Accepted Answer · 2012-01-20T14:56:03+0000

 find ... | sort -rn | cut -d\ -f2 | xargs df -h

for example :) or

 find $dir -type -f size +$size -print0 | xargs -0 ls -1hsS

(with a little inspiration borrowed from oliber ).

Command for printing large files, sorting, with dimensions in a humanoid format

More articles: