Replace lines with lines from another text file by matching patterns

I have a file with a matching key -> value:

sort keyFile.txt | head ENSMUSG00000000001 ENSMUSG00000000001_Gnai3 ENSMUSG00000000003 ENSMUSG00000000003_Pbsn ENSMUSG00000000003 ENSMUSG00000000003_Pbsn ENSMUSG00000000028 ENSMUSG00000000028_Cdc45 ENSMUSG00000000028 ENSMUSG00000000028_Cdc45 ENSMUSG00000000028 ENSMUSG00000000028_Cdc45 ENSMUSG00000000031 ENSMUSG00000000031_H19 ENSMUSG00000000031 ENSMUSG00000000031_H19 ENSMUSG00000000031 ENSMUSG00000000031_H19 ENSMUSG00000000031 ENSMUSG00000000031_H19 

And I would like to replace each matching "key" with a "value" in temp.txt:

 head temp.txt ENSMUSG00000000001:001 515 ENSMUSG00000000001:002 108 ENSMUSG00000000001:003 64 ENSMUSG00000000001:004 45 ENSMUSG00000000001:005 58 ENSMUSG00000000001:006 63 ENSMUSG00000000001:007 46 ENSMUSG00000000001:008 11 ENSMUSG00000000001:009 13 ENSMUSG00000000003:001 0 

The result should be:

 out.txt ENSMUSG00000000001_Gnai3:001 515 ENSMUSG00000000001_Gnai3:002 108 ENSMUSG00000000001_Gnai3:003 64 ENSMUSG00000000001_Gnai3:004 45 ENSMUSG00000000001_Gnai3:005 58 ENSMUSG00000000001_Gnai3:006 63 ENSMUSG00000000001_Gnai3:007 46 ENSMUSG00000000001_Gnai3:008 11 ENSMUSG00000000001_Gnai3:009 13 ENSMUSG00000000001_Gnai3:001 0 

I tried several options after this AWK example , but as you can see, the result is not as expected:

 awk 'NR==FNR{a[$1]=$1;next}{$1=a[$1];}1' keyFile.txt temp.txt | head 515 108 64 45 58 63 46 11 13 0 

I assume that column 1 temp does not exactly match column 1 keyValues. Can someone please help me with this?

R / python / sed solutions are also welcome.

+4
source share
4 answers

Use the awk command as follows:

 awk 'NR==FNR {a[$1]=$2;next} { split($1, b, ":"); if (b[1] in a) print a[b[1]] ":" b[2], $2; else print $0; }' keyFile.txt temp.txt 
+5
source

Another awk option

 awk -F: 'NR == FNR{split($0, a, " "); x[a[1]]=a[2]; next}{print x[$1]":"$2}' keyFile.txt temp.txt 
+2
source

Code for GNU :

  $ sed -nr '$! N; / ^ (. *) \ n \ 1 $ /! bk; D;: k; s # \ S + \ s + (\ w +) _ (\ w +) # / ^ \ 1 / s / (\\ w +) (: \\ w +) \\ s + (\\ w +) / \\ 1_ \ 2 \\ 2 \\ 3 / p #; P; s / ^ (. *) \ n // 'keyfile.txt | sed -nrf - temp.txt
 ENSMUSG00000000001_Gnai3: 001 515
 ENSMUSG00000000001_Gnai3: 002 108
 ENSMUSG00000000001_Gnai3: 003 64
 ENSMUSG00000000001_Gnai3: 004 45
 ENSMUSG00000000001_Gnai3: 005 58
 ENSMUSG00000000001_Gnai3: 006 63
 ENSMUSG00000000001_Gnai3: 007 46
 ENSMUSG00000000001_Gnai3: 008 11
 ENSMUSG00000000001_Gnai3: 009 13
 ENSMUSG00000000003_Pbsn: 001 0
+2
source

Another version of awk :

 awk 'NR==FNR{a[$1]=$2;next} {sub(/[^:]+/,a[substr($1,1,index($1,":")-1)])}1' keyFile.txt temp.txt 
+1
source

Source: https://habr.com/ru/post/1488308/


All Articles