Sum rows based on unique awk columns

Question

Sum rows based on unique awk columns

I am looking for a more elegant way to do this (over 100 columns):

awk '{a[$1]+=$4}{b[$1]+=$5}{c[$1]+=$6}{d[$1]+=$7}{e[$1]+=$8}{f[$1]+=$9}{g[$1]+=$10}END{for(i in a) print i,a[i],b[i],c[i],d[i],e[i],f[i],g[i]}'

Here is the input:

 a1 1   1   2   2
 a2 2   5   3   7
 a2 2   3   3   8
 a3 1   4   6   1
 a3 1   7   9   4
 a3 1   2   4   2

and conclusion:

 a1 1 1 2 2
 a2 4 8 6 15
 a3 3 13 19 7

Thanks:)

+4

awk count unique

user2904120 Feb 14 '14 at 17:02

source share

3 answers

If you need output order try this

$ cat file
a1 1   1   2   2
a2 2   5   3   7
a2 2   3   3   8
a3 1   4   6   1
a3 1   7   9   4
a3 1   2   4   2

Awk Code:

$ cat tester
awk 'FNR==NR{
              U[$1]                             # Array U with index being field1
              for(i=2;i<=NF;i++)                # loop through columns thats is column2 to NF
              A[$1,i]+=$i                       # Array A holds sum of columns
              next                              # stop processing the current record and go on to the next record
            }
   ($1 in U){                                   # Here we read same file once again,if field1 is found in array U, then following statements
              for(i=1;i<=NF;i++)
              s = s ? s OFS A[$1,i] : A[$1,i]   # I am writing sum to variable s since I want to use only one print statement, here you can use printf also
              print $1,s                        # print column1 and variable s
              delete U[$1]                      # We have done, so delete array element
              s = ""                            # reset variable s
            }' OFS='\t' file{,}                 # output field separator is tab you can set comma also

Resulting

$ bash tester
a1  1   1   2   2
a2  4   8   6   15
a3  3   13  19  7

If you want to try this on Solaris/SunOS system, change awkto /usr/xpg4/bin/awk, /usr/xpg6/bin/awkornawk

- change -

, , .

$ awk 'FNR==NR{U[$1];for(i=2;i<=NF;i++)A[$1,i]+=$i;next}($1 in U){for(i=1;i<=NF;i++)s = s ? s OFS A[$1,i] : A[$1,i];print $1,s;delete U[$1];s = ""}' OFS='\t' file{,}
a1  1   1   2   2
a2  4   8   6   15
a3  3   13  19  7

+1

Akshay Hegde 14 . '14 20:30

Using array arrays in gnu awk version 4

awk '{for (i=2;i<=NF;i++) a[$1][i]+=$i}
END{for (i in a) 
      { printf i FS;
        for (j in a[i]) printf a[i][j] FS 
        printf RS}
    }' file     

a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7

+1

BMW Feb 14 '14 at 10:48

source share

Kent · Accepted Answer · 2014-02-14T17:16:10+0000

I am breaking a single-line layer into lines to make it easier to read.

awk '{n[$1];for(i=2;i<=NF;i++)a[$1,i]+=$i}
    END{for(x in n){
        printf "%s ", x
        for(y=2;y<=NF;y++)printf "%s%s", a[x,y],(y==NF?ORS:FS)
        }
    }' file

this awk command should work with your 100 column files.

check your file:

kent$  cat f
a1 1   1   2   2
a2 2   5   3   7
a2 2   3   3   8
a3 1   4   6   1
a3 1   7   9   4
a3 1   2   4   2

kent$  awk '{n[$1];for(i=2;i<=NF;i++)a[$1,i]+=$i}END{for(x in n){printf "%s ", x;for(y=2;y<=NF;y++)printf "%s%s", a[x,y],(y==NF?ORS:OFS)}}' f
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7

Sum rows based on unique awk columns

More articles: