Gsub many columns at the same time based on different gsub conditions?

I have a file with the following data -

Input -

A B C D E F
A B B B B B
C A C D E F
A B D E F A
A A A A A F
A B C B B B

If any of the other lines, starting from line 2, has the same letter as line 1, they should be changed to 1. Basically, I am trying to find out how similar any of the lines in the first line are.

Desired Result -

1 1 1 1 1 1
1 1 B B B B
C A 1 1 1 1
1 1 D E F A
1 A A A A 1
1 1 1 B B B

The first line became all 1, since it is identical to itself (obviously). In the second row, the first and second columns are identical to the first row ( A B) and therefore become 1 1. And so on for other lines.

I wrote the following code that does this conversion -

for seq in {1..1} ; #Iterate over the rows (in this case just row 1)
do 
    for position in {1..6} ; #Iterate over the columns
    do 
        #Define the letter in the first row with which I'm comparing the rest of the rows
        aa=$(awk -v pos=$position -v line=$seq 'NR == line {print $pos}' f) 
        #If it matches, gsub it to 1 
        awk -v var=$aa -v pos=$position '{gsub (var, "1", $pos)} 1' f > temp
        #Save this intermediate file and now act on this
        mv temp f 
    done 
done

As you can imagine, this is very slow because this nested loop is expensive. My real data is a 60x10000 matrix, and it takes about 2 hours for this program.

, , 6 gsubs . , ? awk .

+4
2

$ cat f
A B C D E F
A B B B B B
C A C D E F
A B D E F A
A A A A A F
A B C B B B

o/p

$ awk 'FNR==1{split($0,a)}{for(i=1;i<=NF;i++)if (a[i]==$i) $i=1}1' f
1 1 1 1 1 1
1 1 B B B B
C A 1 1 1 1
1 1 D E F A
1 A A A A 1
1 1 1 B B B

  • FNR==1{ .. }

awk , -

split (, [, fieldsep [, seps]])

, sep, .

  • split($0,a)

($0) fieldsep (defualt space, as ) a a

       a[1] = A 
       a[2] = B
       a[3] = C 
       a[4] = D  
       a[5] = E  
       a[6] = F
  • for(i=1;i<=NF;i++)

.

  • if (a[i]==$i) $i=1

(i) = 1 ( )

,

  • }1

    1 true, {print $0}

, , . 6, 2, 4, 2, 2, 3 . , ?

$ awk 'FNR==1{split($0,a)}{s=0;for(i=1;i<=NF;i++)if(a[i]==$i)s+=$i=1;print $0,s}' f
1 1 1 1 1 1 6
1 1 B B B B 2
C A 1 1 1 1 4
1 1 D E F A 2
1 A A A A 1 2
1 1 1 B B B 3
+3

awk , , , awk :

awk '{for (i=1; i<=NF; i++) {if (NR==1) a[i]=$i; if (a[i]==$i) $i=1} } 1' file

1 1 1 1 1 1
1 1 B B B B
C A 1 1 1 1
1 1 D E F A
1 A A A A 1
1 1 1 B B B

EDIT:

, :

awk '{sum=0; for (i=1; i<=NF; i++) { if (NR==1) a[i]=$i; if (a[i]==$i) $i=1; sum+=$i}
      print $0, sum}' file

1 1 1 1 1 1 6
1 1 B B B B 2
C A 1 1 1 1 4
1 1 D E F A 2
1 A A A A 1 2
1 1 1 B B B 3
+4

Source: https://habr.com/ru/post/1665648/


All Articles