Replace the column in the file, but keep the space format

I have the code below that replaces the 4th column in file A based on the data in file B, but the output does not support spaces in the source file. Is there any way to do this?

 tr , " " <fileB | awk 'NR==FNR{a[$2]=$1;next} {$4=a[$4];print}' - fileA

FILEA

 xxx    xxx   xxx Z0002

FILEB

 3100,3000
 W0002,Z0002

using the above code:

 xxx xxx xxx W0002

expected output:

xxx    xxx   xxx W0002
+4
source share
3 answers

This should do:

awk 'FNR==NR {split($0,a,",");b[a[2]]=a[1];next} {n=split($0,d,/[^[:space:]]*/);if(b[$4])$4=b[$4];for(i=1;i<=n;i++) printf("%s%s",d[i],$i);print ""}' fileB fileA

Stores spaces in an array, so it can reuse it later

Example:

cat fileA
xxx    xxx   xxx Z0002   not change this
xxx   xxx  Z0002 zzz
xxx Z000223213 xxx Z0002 xxx xxx xxx Z0002

cat fileB
3100,3000
W0002,Z0002

awk 'FNR==NR {split($0,a,",");b[a[2]]=a[1];next} {n=split($0,d,/[^[:space:]]*/);if(b[$4])$4=b[$4];for(i=1;i<=n;i++) printf("%s%s",d[i],$i);print ""}' fileB fileA
xxx    xxx   xxx  W0002   not change this
xxx   xxx  Z0002 zzz
xxx Z000223213 xxx  W0002 xxx xxx xxx Z0002

Some more readable and how it works:

awk '
FNR==NR {                           # For the first file "fileB"
    split($0,a,",")                 # Split it to an array "a" using "," as separator 
    b[a[2]]=a[1]                    # Store the data in array "b" using second column as index
    next                            # Skip to next record
    }
    {                               # Then for the file "fileA"
    n=split($0,d,/[^[:space:]]*/)   # Split the spaces inn group and store them in array "d"
    if(b[$4])                       # If array "b" as data for field "4"
        $4=b[$4]                    # Change filed "4" to data found in array "b"
    for(i=1;i<=n;i++)               # Loop trough all field in the line
        printf("%s%s",d[i],$i)      # print correct separator and data
    print ""                        # Add new line at the end
    }
' fileB fileA                       # Read the files.
+1
source

Use gsub (regular expression substitution), with a space before the space and the end of the line $after it solves the problem.

Test file:

$ cat fileA
xxx    xxx   xxx Z0002
xxx    xxx   Z0002 xxx
xxx    xxx   xxx Z0002YY

Command execution and results:

$ tr , " " <fileB | awk 'NR==FNR{a[$2]=$1;next}  a[$4]=="" {print} a[$4]!=""{gsub(" "$4"$", " "a[$4], $0);print}' - fileA
xxx    xxx   xxx W0002
xxx    xxx   Z0002 xxx
xxx    xxx   xxx Z0002YY
0

awk

, , .

This will avoid problems with metacharacters and patterns occurring elsewhere on the line.

awk 'FNR==NR {split($0,a,",");b[a[2]]=a[1];next}
     {
         while(match(substr($0,x+=(RSTART+RLENGTH-(x>1?1:0))),"[^[:space:]]+")){
             E[++D]=(RSTART+x-(x>1?1:0))
             F[D]=E[D]+RLENGTH
         }
     }

     b[$4]~/./{$0=substr($0,0,E[4]-1) b[$4] substr($0,F[4])}
     {x=1;D=0;delete E}1' FILEB FILEA

Example

input

FILEA

xxx    Z0002   xxx Z0002 xxx    xxx   xxx Z0002
xxx    Z0002   xxx dsasa xxx    xxx   xxx Z0002

FILEB

3100,3000
W0002,Z0002

Output

xxx    Z0002   xxx W0002 xxx    xxx   xxx Z0002
xxx    Z0002   xxx dsasa xxx    xxx   xxx Z0002

Explanation

Will be added later

0
source

Source: https://habr.com/ru/post/1585083/


All Articles