Using Multi-Channel Field Separator Using AWK

I'm having problems with the AWK field separator, the input file is displayed below

1 | all | | synonym | 1 | root | | scientific name | 2 | Bacteria Bacteria scientific name | 2 | Monera | Monera | in part | 2 | Prokaryota | Prokaryota | in part | 2 | Prokaryota | Prokaryota | in part | 2 | Prokaryota | Prokaryota | in part | 2 | bacteria | bacteria | blast name |

the field separator here is tab, pipe, tab \t|\t so in my attempt to print only the 1st and 2nd columns

 awk -F'\t|\t' '{print $1 "\t" $2}' nodes.dmp | less 

instead of the desired output, the output will be the first column followed by the channel symbol. I tried to get out of the \t\|\t pipe, but the result remained the same.

1 | 1 | 2 | 2 | 2 | 2 |

Printing the 1st and 3rd columns gave me the original intended output.

 awk -F'\t|\t' '{print $1 "\t" $3}' nodes.dmp | less 

but I am puzzled by why this is not working as intended.

I understand that the first liner perl will work, but I really want to use awk.

 perl -aln -F"\t\|\t" -e 'print $F[0],"\t",$F[1]' nodes.dmp | less 
+4
source share
3 answers

Pipe symbol | seems confusing awk , assuming that \t|\t implies that the field separator can be one of \t or \t . Say awk to interpret | literally.

 $ awk -F'\t[|]\t' '{print $1 "\t" $2}' 1 all 1 root 2 Bacteria 2 Monera 2 Procaryotae 2 Prokaryota 2 Prokaryotae 2 bacteria 
+6
source

From your hosted input:

  • your lines may end with | , not |\t and
  • you have cases (first 2 lines) where the input contains |\t| , and
  • your lines start with a bookmark

So, FS tab-pipe-tab is incorrect, because it will not correspond to any of the above cases, since the first is only a tab, and the tab in the middle of the second will correspond to the tab - the tab from the previous field, but then it just leaves the pipe tab for the next field, and the first leaves you with an unwanted main tab.

In fact, you only need to configure FS to tab, and then remove the top tab from each field:

 awk -F'\t|' -v OFS='\t' '{gsub(/(^|[|])\t/,""); print $1, $2}' file 

Thus, you can process all fields from 1 to NF-1 in the same way as others.

+1
source

Using the cut :

  cut -f1,2 -d'|' file.txt 

without pipe in the output:

  cut -f1,2 -d'|' file.txt | tr -d '|' 
0
source

Source: https://habr.com/ru/post/1496717/


All Articles