Separated columns separated by tabs and spaces

I have a really weird fileformat here that uses tabs and spaces in any amount to separate fields (even trailing and leading). Another specialty is that fields can be added with spaces in them, which are then escaped in the CSV way.

One example:

   0    "some string" 234      23947     123 ""some escaped"string""

I am trying to parse such columns with awk, and I will need to have every element in the array, for example.

foo[0] -> 0
foo[1] -> "some string"
foo[2] -> 234
foo[3] -> 23947
foo[4] -> 123
foo[5] -> ""some escaped"string""

Is it possible? I read http://web.archive.org/web/20120531065332/http://backreference.org/2010/04/17/csv-parsing-with-awk/ , which states that csv parsing is already very difficult (for starters, this should be enough to parse ordinary lines with spaces, shielded version is very rare)

: awk - ?

0
1

GNU awk FPAT:

$ cat tst.awk
BEGIN { FPAT="\\S+|\"[^\"]+\"|,[^,]+," }
{
    gsub(/@/,"@A")
    gsub(/,/,"@B")
    gsub(/""/,",")
    for (i=1; i<=NF; i++) {
        gsub(/,/,"\"\"",$i)
        gsub(/@B/,",",$i)
        gsub(/@A/,"@",$i)
        print i, $i
    }
}

$ awk -f tst.awk file
1 0
2 "some string"
3 234
4 23947
5 123
6 ""some escaped"string""

, , . fooobar.com/questions/1660329/...

+1

Source: https://habr.com/ru/post/1660332/


All Articles