Calculation of percentages in an arbitrary number of columns

Given this input example:

ID Sample1 Sample2 Sample3 One 10 0 5 Two 3 6 8 Three 3 4 7 

I needed to create this output using AWK:

 ID Sample1 Sample2 Sample3 One 62.50 0.00 25.00 Two 18.75 60.00 40.00 Three 18.75 40.00 35.00 

Here's how I solved it:

 function percent(value, total) { return sprintf("%.2f", 100 * value / total) } { label[NR] = $1 for (i = 2; i <= NF; ++i) { sum[i] += col[i][NR] = $i } } END { title = label[1] for (i = 2; i <= length(col) + 1; ++i) { title = title "\t" col[i][1] } print title for (j = 2; j <= NR; ++j) { line = label[j] for (i = 2; i <= length(col) + 1; ++i) { line = line "\t" percent(col[i][j], sum[i]) } print line } } 

This works fine in GNU AWK ( awk on Linux, gawk on BSD), but not on BSD AWK, where I get this error:

 $ awk -f script.awk sample.txt awk: syntax error at source line 7 source file script.awk context is sum[i] += >>> col[i][ <<< awk: illegal statement at source line 7 source file script.awk awk: illegal statement at source line 7 source file script.awk 

The problem seems to be related to multidimensional arrays. I would like to make this work script in BSD AWK too, so it is more portable.

Is there a way to change this to make it work in BSD AWK?

+6
source share
2 answers

Try using a pseudo-two-dimensional shape. Instead

 col[i][NR] 

using

 col[i,NR] 

This is a 1-dimensional array, the key is a concatenated string: i SUBSEP NR

+4
source

@glenn's answer got me on the right track. However, it took a bit more work:

  • Using col[i, NR] made it difficult to work with column headers. This helped to significantly remove the buffering of column names and print them immediately after reading
  • length(col) + 1 no longer used in the last condition of the loop, since using col[i, j] made the loops infinite. As a workaround, I could replace length(col) + 1 just NF

Here's the final implementation, which now works in both GNU and BSD AWK versions:

 function percent(value, total) { return sprintf("%.2f", 100 * value / total) } BEGIN { OFS = "\t" } NR == 1 { gsub(/ +/, OFS); print } NR != 1 { label[NR] = $1 for (i = 2; i <= NF; ++i) { sum[i] += col[i, NR] = $i } } END { for (j = 2; j <= NR; ++j) { line = label[j] for (i = 2; i <= NF; ++i) { line = line OFS percent(col[i, j], sum[i]) } print line } } 
+3
source

Source: https://habr.com/ru/post/980086/


All Articles