Unix - max (length) of each column in the file

Given a file with such data (e.g. store.dat file)

sid|storeNo|latitude|longitude 2tt|1|-28.0372000t0|153.42921670 9|2t|-33tt.85t09t0000|15t1.03274200 

Required Conclusion:

 sid : 3 storeNo : 2 latitude : 16 longitude : 13 

What is the syntax for returning the maximum length of values ​​under each column?

I tried this, but it does not work:

 nawk 'BEGIN { FS = "|" } { for(n = 1; n <= NF; n++) { if (length($n) > max) max = length($n) maxlen[$n] = max } } END { for (i in maxlen) print "col " i ": " maxlen[i] } ' stores.dat 

UPDATE (thanks Mat answer - I decided on this):

 awk -F"|" ' NR==1{ for(n = 1; n <= NF; n++) { colname[n]=$n } } NR>1{ for(n = 1; n <= NF; n++) { if (length($n)>maxlen[n]) maxlen[n]=length($n) } } END { for (i in colname) { print colname[i], ":", maxlen[i]+0; } } ' filename 
+4
source share
2 answers

A few problems with your script - max are split between columns, and you are not dealing with the header line at all. Try the following:

 $ cat t.awk #!/bin/awk -f NR==1{ for(n = 1; n <= NF; n++) { colname[n]=$n } } NR>1{ for(n = 1; n <= NF; n++) { if (length($n)>maxlen[n]) maxlen[n]=length($n) } } END { for (i in maxlen) { print colname[i], ":", maxlen[i]; } } $ awk -F'|' -f t.awk stores.dat 

$n refers to the contents of column n th. n is the column number (in the first and second cycles). The last loop just shows how to iterate over an array in awk .

+6
source

I use a clean bash approach:

 #!/usr/bin/env bash dat=./stores.dat del='|' TOKENS=$(head -1 "${dat}" | tr $del ' ') declare -a col=( $TOKENS ) declare -a max skip=1 while IFS=$del read $TOKENS; do if [ $skip -eq 1 ]; then skip=0 continue fi idx=0 for tok in ${TOKENS}; do tokref=${!tok} printf "%-10s = %-16s[%2d] " "$tok" "${tokref}" "${#tokref}" echo "--> max=${max[$idx]} tokref=${#tokref}" #This works : c=$a>$b?$a:$b #This doesn't: max[$idx]=${max[$idx]}>${#tokref}?${max[$idx]}:${#tokref} max[$idx]=$((${max[$idx]:=0}>${#tokref}?${max[$idx]}:${#tokref})) let idx++ done printf "\n" done < ${dat} for ((idx=0; idx<${#col[@]}; idx++)); do printf "%-10s : %d\n" "${col[$idx]}" "${max[$idx]}" done 

The output is as follows:

 sid = 2tt [ 3] --> max=0 tokref=3 storeNo = 1 [ 1] --> max=0 tokref=1 latitude = -28.0372000t0 [13] --> max=0 tokref=13 longitude = 153.42921670 [12] --> max=0 tokref=12 sid = 9 [ 1] --> max=3 tokref=1 storeNo = 2t [ 2] --> max=1 tokref=2 latitude = -33tt.85t09t0000[16] --> max=13 tokref=16 longitude = 15t1.03274200 [13] --> max=12 tokref=13 sid : 3 storeNo : 2 latitude : 16 longitude : 13 

I added this solution because I liked this task and I had a few minutes to save.

0
source

Source: https://habr.com/ru/post/1387938/


All Articles