Replace lines with lines from another text file by matching patterns

Question

Replace lines with lines from another text file by matching patterns

I have a file with a matching key -> value:

sort keyFile.txt | head ENSMUSG00000000001 ENSMUSG00000000001_Gnai3 ENSMUSG00000000003 ENSMUSG00000000003_Pbsn ENSMUSG00000000003 ENSMUSG00000000003_Pbsn ENSMUSG00000000028 ENSMUSG00000000028_Cdc45 ENSMUSG00000000028 ENSMUSG00000000028_Cdc45 ENSMUSG00000000028 ENSMUSG00000000028_Cdc45 ENSMUSG00000000031 ENSMUSG00000000031_H19 ENSMUSG00000000031 ENSMUSG00000000031_H19 ENSMUSG00000000031 ENSMUSG00000000031_H19 ENSMUSG00000000031 ENSMUSG00000000031_H19

And I would like to replace each matching "key" with a "value" in temp.txt:

 head temp.txt ENSMUSG00000000001:001 515 ENSMUSG00000000001:002 108 ENSMUSG00000000001:003 64 ENSMUSG00000000001:004 45 ENSMUSG00000000001:005 58 ENSMUSG00000000001:006 63 ENSMUSG00000000001:007 46 ENSMUSG00000000001:008 11 ENSMUSG00000000001:009 13 ENSMUSG00000000003:001 0

The result should be:

 out.txt ENSMUSG00000000001_Gnai3:001 515 ENSMUSG00000000001_Gnai3:002 108 ENSMUSG00000000001_Gnai3:003 64 ENSMUSG00000000001_Gnai3:004 45 ENSMUSG00000000001_Gnai3:005 58 ENSMUSG00000000001_Gnai3:006 63 ENSMUSG00000000001_Gnai3:007 46 ENSMUSG00000000001_Gnai3:008 11 ENSMUSG00000000001_Gnai3:009 13 ENSMUSG00000000001_Gnai3:001 0

I tried several options after this AWK example , but as you can see, the result is not as expected:

 awk 'NR==FNR{a[$1]=$1;next}{$1=a[$1];}1' keyFile.txt temp.txt | head 515 108 64 45 58 63 46 11 13 0

I assume that column 1 temp does not exactly match column 1 keyValues. Can someone please help me with this?

R / python / sed solutions are also welcome.

+4

regex awk perl sed

fridaymeetssunday Jun 26 '13 at 13:48

source share

4 answers

Another awk option

 awk -F: 'NR == FNR{split($0, a, " "); x[a[1]]=a[2]; next}{print x[$1]":"$2}' keyFile.txt temp.txt

+2

iruvar Jun 26 '13 at 14:04

source share

Code for GNU sed :

  $ sed -nr '$! N; / ^ (. *) \ n \ 1 $ /! bk; D;: k; s # \ S + \ s + (\ w +) _ (\ w +) # / ^ \ 1 / s / (\\ w +) (: \\ w +) \\ s + (\\ w +) / \\ 1_ \ 2 \\ 2 \\ 3 / p #; P; s / ^ (. *) \ n // 'keyfile.txt | sed -nrf - temp.txt
 ENSMUSG00000000001_Gnai3: 001 515
 ENSMUSG00000000001_Gnai3: 002 108
 ENSMUSG00000000001_Gnai3: 003 64
 ENSMUSG00000000001_Gnai3: 004 45
 ENSMUSG00000000001_Gnai3: 005 58
 ENSMUSG00000000001_Gnai3: 006 63
 ENSMUSG00000000001_Gnai3: 007 46
 ENSMUSG00000000001_Gnai3: 008 11
 ENSMUSG00000000001_Gnai3: 009 13
 ENSMUSG00000000003_Pbsn: 001 0

+2

captcha Jun 26 '13 at 15:23

source share

Another version of awk :

 awk 'NR==FNR{a[$1]=$2;next} {sub(/[^:]+/,a[substr($1,1,index($1,":")-1)])}1' keyFile.txt temp.txt

+1

jaypal singh Jun 26 '13 at 14:36

source share

anubhava · Accepted Answer · 2013-06-26T13:58:25+0000

Use the awk command as follows:

 awk 'NR==FNR {a[$1]=$2;next} { split($1, b, ":"); if (b[1] in a) print a[b[1]] ":" b[2], $2; else print $0; }' keyFile.txt temp.txt

Replace lines with lines from another text file by matching patterns

More articles: