How to split a row depends on the template in another column (UNIX environment)

Question

How to split a row depends on the template in another column (UNIX environment)

I have a TAB file something like this:

VI 280 6 - VRSSAI NV 2739 7 - SAVNATA AR 203 5 - AEERR QA 2517 7 - AQSTPSP SS 1012 5 - GGGSS LA 281 11 - AAEPALSAGSL

And I would like to check the last column for the order of letters in the 1st and 2nd columns. If the matches between the first and last letter in the last column compared to the 1st and 2nd columns, respectively, remain identical. On the contrary, if there are no matches, I would like to find the reverse order in the last column, and then print the line from the letter in the 1st column to the end, and then take the first letter and print it in the letter in the second column. Desired Result:

 VI 280 6 - VRSSAI NV 2739 7 - NATASAV AR 203 5 - AEERR QA 2517 7 - QSTPSPA SS 1012 5 - SGGGS LA 281 11 - LSAGSLAAEPA

Thus, I am trying to execute different scripts, but it is not working correctly. I don’t know exactly why.

 awk 'BEGIN {FS=OFS="\t"}{gsub(/$2$1/,"\t",$6); print $1$7$6$2}' "input" > "output";

Another way:

 awk 'BEGIN {FS=OFS="\t"} {len=split($11,arrseq,"$7$6"); for(i=0;i<len;i++){printf "%s ",arrseq[i],arrseq[i+1]}' `"input" > "output";`

And I am also trying to use the substr function, but finally no one is working correctly. Is it possible to do in bash? thanks in advance

I am trying to give an example to better understand the issue.

 $1 $2 $6 LA AAEPALSAGSL (reverse pattern 'AL' $2$1)

desired result in $ 6 from the corresponding letter $ 2 in the reverse pattern to the end following the first letter to the corresponding letter $ 1 in the reverse pattern

 $1 $2 $6 LA LSAGSLAAEPA

+5

split unix bash awk substr

Perceval Vellosillo Gonzalez Dec 27 '17 at 17:11

source share

3 answers

You can try this awk, it is not perfect, but it gives you a starting point.

 awk '{i=(match($6,$1));if(i==1)print;else{a=$6;b=substr(a,i);c=substr(a,1,(i-1));$6=bc;print}}' OFS='\t' infile

+2

ctac_ Dec 27 '17 at 20:06

source share

 gawk ' BEGIN{ OFS="\t" } $6 !~ "^"$1".*"$2"$" { $6 = gensub("(.*"$2")("$1".*)", "\\2\\1", 1, $6) } {print} ' input.txt

Output

 VI 280 6 - VRSSAI NV 2739 7 - NATASAV AR 203 5 - AEERR QA 2517 7 - QSTPSPA SS 1012 5 - SGGGS LA 281 11 - LSAGSLAAEPA

+1

Minimax Dec 27 '17 at 23:48

source share

Pesathe · Accepted Answer · 2017-12-27T22:23:58+0000

If I understood the question correctly, this awk should do this:

 awk '( substr($6, 1, 1) != $1 || substr($6, length($6), 1) != $2 ) && i = index($6, $2$1) { $6 = substr($6, i+1) substr($6, 1, i) }1' OFS=$'\t' data

Basically you want to rotate the line so that the beginning of the line matches char at $1 and the end of the line matches char at $2 . Lines that cannot be rotated in accordance with this condition remain unchanged, for example:

 AB 3 3 - BCAAB

How to split a row depends on the template in another column (UNIX environment)

More articles: