How to parse contents with a specific column in a csv file in bash

I am trying to parse a csv file line by line and its format looks something like this:

"name","content1,with commas as you see", "content2, also may contain commas", "..." ... ... 

I want to get content in a specific column without quotes. for example: 1st column and 3rd. Thus, the expected content should be:

 name (if get column 1) content2, also may contain commas (if get column 3) 

I tried to use awk, but that didn't work. I also tried:

 while IFS=, read col1 col2 col3 col4; do echo "got ${col1}|${col3}"; done < file 

But it contains quotation marks "", and the contents of col3 are erroneous, which mixes a comma inside each column. How do I split such formats containing commas in each column?

+4
source share
2 answers

Due to complexities like these, it is probably a lot easier if you use the actual CSV parser like csvtool :

 $ csvtool col 3 - <<< '"name","content1,with commas as you see", "content2, also may contain commas", "..."' "content2, also may contain commas" 
+3
source

If you have GNU awk , then FPAT will come to your aid.

 gawk '{print $1,$3}' FPAT="([^,]+)|(\"[^\"]+\")" my.csv 

In awk we usually use FS , which determines which field is not, not a field. In this particular case, we really want to determine the fields by what they are, and FPAT allows us to do just that.

+3
source

Source: https://habr.com/ru/post/1485621/


All Articles