Pandas, read CSV ignoring extra commas

Question

Pandas, read CSV ignoring extra commas

I am reading a 8-column CSV file in the Pandas framework. The last column contains an error message, some of which contain commas. This causes the error reading the file.ParserError: Error tokenizing data. C error: Expected 8 fields in line 21922, saw 9

Is there a way to ignore all commas after the 8th field, instead of going through the file and removing the extra commas?

Code for reading a file:

import pandas as pd
df = pd.read_csv('C:\\somepath\\output.csv')

Line that works:

061AE,Active,001,2017_02_24 15_18_01,00006,1,00013,some message

Line Error:

061AE,Active,001,2017_02_24 15_18_01,00006,1,00013,longer message, with commas

+4

python pandas

MikeS159 Feb 07 '18 at 15:59

source share

3 answers

, Pandas :

import csv
import pandas as pd
import numpy as np

df = pd.read_csv('filename.csv', parse_dates=True, dtype=Object, delimiter="\t", quoting=csv.QUOTE_NONE, encoding='utf-8')

, 7 ( 0- 6-, ) *, - :

to_write = []
counter = 0
with open("sampleCSV.csv", "r") as f:
    for line in f:
        while counter < 7:
            line = list(line)
            line[line.index(",")] = ";"
            counter += 1
        counter = 0
        to_write.append("".join(line))

to_write Pandas,

data = pd.DataFrame(to_write)
data = pd.DataFrame(data[0].str.split(";").values.tolist()),

csv Pandas , read_csv(csv_path, sep=';').

, , . , , .

* - 7- . , 7 .

+2

FatihAkici 07 . '18 20:09

You can use the usecols parameter in the read_csv function to limit which columns you read. For instance:

import pandas as pd
pd.read_csv(path, usecols=range(8))

if you want to read only the first 8 columns.

+1

Tblaz Feb 07 '18 at 16:08

source share

Bill bell · Accepted Answer · 2018-02-07T20:06:27+0000

You can use re.subto replace the first few commas, for example, '|', store the intermediate results in StringIO, then process this.

import pandas as pd
from io import StringIO
import re

for_pd = StringIO()
with open('MikeS159.csv') as mike:
    for line in mike:
        new_line = re.sub(r',', '|', line.rstrip(), count=7)
        print (new_line, file=for_pd)

for_pd.seek(0)

df = pd.read_csv(for_pd, sep='|', header=None)
print (df)

I put two lines from your question into a file to get this output.

       0       1  2                    3  4  5   6  \
0  061AE  Active  1  2017_02_24 15_18_01  6  1  13   
1  061AE  Active  1  2017_02_24 15_18_01  6  1  13   

                             7  
0                 some message  
1  longer message, with commas

Pandas, read CSV ignoring extra commas

More articles: