Convert user readable bytes to bash

So, I am trying to analyze very large log files in linux, and I have seen many solutions for this, but the program that writes the data does not allow formatting the output, so it is output only in a humanoid format (I know what a pain). So the question is: how can I convert readability to bytes using something like awk:

So, convert this:

937 1.43K 120.3M 

in

 937 1464 126143693 

I can afford it, and I expect rounding errors.

Thanks in advance.

PS No need to be awk if it can provide conversions in a row.

I found this one , but the above awk command does not work correctly. It outputs something like 534K "0".

I also found a solution using sed and bc, but since it uses bc, it has limited efficiency, meaning that it can only use one column at a time, and all data must be suitable for bc, otherwise it will not work.

sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc

+9
source share
5 answers
 cat dehumanise 937 1.43K 120.3M awk '/[0-9]$/{print $1;next};/[mM]$/{printf "%u\n", $1*(1024*1024);next};/[kK]$/{printf "%u\n", $1*1024;next}' dehumanise 937 1464 126143692 
+4
source

Here is a function that understands binary and decimal prefixes and is easily expanded for large units, if necessary:

 dehumanise() { for v in "${@:-$(</dev/stdin)}" do echo $v | awk \ 'BEGIN{IGNORECASE = 1} function printpower(n,b,p) {printf "%u\n", n*b^p; next} /[0-9]$/{print $1;next}; /K(iB)?$/{printpower($1, 2, 10)}; /M(iB)?$/{printpower($1, 2, 20)}; /G(iB)?$/{printpower($1, 2, 30)}; /T(iB)?$/{printpower($1, 2, 40)}; /KB$/{ printpower($1, 10, 3)}; /MB$/{ printpower($1, 10, 6)}; /GB$/{ printpower($1, 10, 9)}; /TB$/{ printpower($1, 10, 12)}' done } 

Example:

 $ dehumanise 2K 2k 2KiB 2KB 2048 2048 2048 2000 $ dehumanise 2G 2g 2GiB 2GB 2147483648 2147483648 2147483648 2000000000 

Suffixes are not case sensitive.

+9
source

Python tools exist

 $pip install humanfriendly # Also available as a --user install in ~/.local/bin $humanfriendly --parse-size="2 KB" 2000 $humanfriendly --parse-size="2 KiB" 2048 
+2
source

Function awk 'pp (p) {printf "% u \ n", $ 0 * 1024 ^ p} / [0-9] $ / {print $ 0} / K $ / {pp (1)} / M $ / {pp (2)} / G $ / {pp (3)} / T $ / {pp (4)} / [^ 0-9KMGT] $ / {print 0} '

This is a modification of @starfry's answer.


Let's figure it out:

function pp (p) {printf "% u \ n", $ 0 * 1024 ^ p}

Define a function called pp that takes one parameter p and prints $0 times 1024, raised to the p-th degree. %u print an unsigned decimal integer for that number.

/ [0-9] $ / {print $ 0}

Match the lines ending with a digit ( $ matches the end of the line), then run the code inside { and } . Print the entire line ( $0 )

/ K $ / {pp (1)}

Match the lines that end with the capital letter K , call the pp () function, and pass it 1 (p == 1). NOTE. When $ 0 is used in the mathematical equation (for example, "1.43 KB"), only the starting numbers will be used below (i.e., "1.43"). Example with $ 0 = "1.43K"

 $0 * 1024^p == 1.43K * 1024^1 == 1.43K * 1024 = 1.43 * 1024 = 1464.32 

/ M $ / {pp (2)}

Match the lines ending with the capital letter M , call the pp () function, and pass it 2 (p == 2). Example with $ 0 == "120.3M"

 $0 * 1024^p == 120.3M * 1024^2 == 120.3M * 1024^2 == 120.3M * 1024*1024 = 120.3 * 1048576 = 126143692.8 

etc ... for G and T

/ [^ 0-9KMGT] $ / {print 0}

Lines that do not end with a digit or capital letters K, M, G or T print "0".


Example:

 $ cat dehumanise 937 1.43K 120.3M 5G 933G 12.2T bad <> 

Results:

 $ awk 'function pp(p){printf "%u\n",$0*1024^p} /[0-9]$/{print $0}/K$/{pp(1)}/M$/{pp(2)}/G$/{pp(3)}/T$/{pp(4)}/[^0-9KMGT]$/{print 0}' dehumanise 937 1464 126143692 5368709120 1001801121792 13414041858867 0 0 
0
source

Use numfmt --from=iec from GNU coreutils.

0
source

Source: https://habr.com/ru/post/1205743/


All Articles