Numpy genfromtxt / pandas read_csv; ignore quotation marks

Question

Numpy genfromtxt / pandas read_csv; ignore quotation marks

Consider the file a.datwith the contents:

address 1, address 2, address 3, num1, num2, num3
address 1, address 2, address 3, 1.0, 2.0, 3
address 1, address 2, "address 3, address4", 1.0, 2.0, 3

I am trying to import from numpy.genfromtxt. However, the function sees an additional column in row 3. I get a similar error with pandas.read_csv:

np.genfromtxt('a.dat',delimiter=',',dtype=None,skiprows=1)

ValueError: Some errors were detected !
    Line #3 (got 7 columns instead of 6)

and

pandas read_csv sort of works - but it gives me an unaligned data structure:

pd.read_csv('a.dat')

pandas.parser.CParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 7

I am trying to find an input parameter to compensate for this. I don't mind if I get a numeric ndarray or pandas dataframe.

Is there a parameter that I can set in genfromtxtand / or read_csvthat will allow me to ignore the comma in speech labels?

I note that it read_csvcontains a parameter quotechar='"'defined in this way:

quotechar: ( 1) , . .

, read_csv - .

, , - , , , .

+4

python numpy pandas file-io genfromtxt

atomh33ls 06 . '14 10:09

2

Python csv .

with open("a.dat") as f:
    reader = csv.reader(f, skipinitialspace=True)
    header = next(reader)
    dtype = numpy.dtype(zip(header, ['S20', 'S20', 'S20', 'f8', 'f8', 'f8']))
    data = numpy.fromiter(itertools.imap(tuple, reader), dtype=dtype)

+2

Sven Marnach 06 . '14 10:21

atomh33ls · Accepted Answer · 2014-06-06T10:24:20+0000

this:

, , skipinitialspace=True - " -"

a=pd.read_csv('a.dat',quotechar='"',skipinitialspace=True)

   address 1  address 2            address 3  num1  num2  num3
0  address 1  address 2            address 3     1     2     3
1  address 1  address 2  address 3, address4     1     2     3

: -)

Numpy genfromtxt / pandas read_csv; ignore quotation marks

More articles: