Importing a CSV file with values ​​that were included in "when some of them contain", as well as commas

I think I searched everywhere, but if I missed something, let me know, please.

I am trying to import a CSV file where all non-numerical values ​​are wrapped in. "I ran into a problem:

 df = pd.read_csv(file.csv)

CSV example:

"Business focus","Country","City","Company Name"
"IT","France","Lyon","Societe General"
"Mining","Russia","Moscow","Company "MoscowMining" Owner1, Owner2, Owner3"
"Agriculture","Poland","Warsaw","Company" Jankowski,A,B""

Due to the multiple quotes and commas inside them, pandas sees more columns than 4 in this case (e.g. 5 or 6).

I already tried to play with

df = pd.read_csv(file.csv, quotechar='"', quoting=2)

But got

ParserError: Error tokenizing data (...)

What works by skipping bad lines

error_bad_lines=False

but I would prefer that all the data be somehow taken into account than just omitting it.

Thanks so much for any help!

+4
source share
2 answers

CSV-, " . , , \. . https://en.wikipedia.org/wiki/Comma-separated_values#cite_ref-13

, , , . , , , " ".

, " ( )" , . , ( , 100% - . , )

s/([^,\n])"([^,\n])/$1""$2/g

, , :

"Business focus","Country","City","Company Name"
"IT","France","Lyon","Societe General"
"Mining","Russia","Moscow","Company ""MoscowMining"" Owner1, Owner2, Owner3"
"Agriculture","Poland","Warsaw","Company"" Jankowski,A,B"""

s/([^,\n])"([^,\n])/$1\"$2/g

:

"Business focus","Country","City","Company Name"
"IT","France","Lyon","Societe General"
"Mining","Russia","Moscow","Company \"MoscowMining\" Owner1, Owner2, Owner3"
"Agriculture","Poland","Warsaw","Company\" Jankowski,A,B\""

CSV .

, @exe, CSV , , .

+2

, , panda, csv.

:

"Business focus","Country","City","Company Name"
"IT","France","Lyon","Societe General"
"Mining","Russia","Moscow","Company \"MoscowMining\" Owner1\, Owner2\, Owner3"
"Agriculture","Poland","Warsaw","Company\" Jankowski\,A\,B\""
0

Source: https://habr.com/ru/post/1693252/


All Articles