Panda read_csv always crashes when working with a small file

Question

Panda read_csv always crashes when working with a small file

I am trying to import a rather small (217 lines, 87 columns, 15 thousand) csv for analysis in Python using Panda . The file is rather poorly structured, but I would like to import it, since this is raw data that I do not want to manually manipulate outside of Python (for example, using Excel). Unfortunately, this always leads to a crash "The kernel seems to be dead, it will automatically restart."

https://www.wakari.io/sharing/bundle/uniquely/ReadCSV

Was there some kind of research that pointed out possible read_csv crashes, but always for really large files, so I don't understand the problem. The failure occurs both using the local installation (64-bit Anaconda, IPython (Py 2.7) Notebook), and Wakari.

Can someone help me? It would be really appreciated. Many thanks!

the code:

# I have a somehow ugly, illustrative csv file, but it is not too big, 217 rows, 87 colums. # File can be downloaded at http://www.win2day.at/download/lo_1986.csv # In[1]: file_csv = 'lo_1986.csv' f = open(file_csv, mode="r") x = 0 for line in f: print x, ": ", line x = x + 1 f.close() # Now I'd like to import this csv into Python using Pandas - but this always lead to a crash: # "The kernel appears to have died. It will restart automatically." # In[ ]: import pandas as pd pd.read_csv(file_csv, delimiter=';') # What am I doing wrong?

+5

python import pandas csv crash

uniquely Aug 18 '14 at 23:19

source share

2 answers

Anthony kong · Answer 1 · 2014-08-19T01:00:11+0000

This is due to an invalid character (e.g. 0xe0) in the file

If you add the encoding parameter to the read_csv () call, you will see this trace stack instead of segfault

 >>> df = pandas.read_csv("/tmp/lo_1986.csv", delimiter=";", encoding="utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 400, in parser_f return _read(filepath_or_buffer, kwds) File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 205, in _read return parser.read() File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 608, in read ret = self._engine.read(nrows) File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1028, in read data = self._reader.read(nrows) File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas/parser.c:6745) File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:6964) File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas/parser.c:7780) File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:8793) File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:9484) File "parser.pyx", line 1026, in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:10642) File "parser.pyx", line 1051, in pandas.parser.TextReader._string_convert (pandas/parser.c:10905) File "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas/parser.c:15657) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: unexpected end of data

You can do preprocessing to remove these characters before asking pandas to read in the file

Attached image to highlight invalid characters in the file

uniquely · Answer 2 · 2014-08-21T17:37:40+0000

Thanks so much for your comments. I could no longer agree with the comment that this is indeed a very confusing csv. But, unfortunately, this is how the Austrian state lottery shares its information with drawn numbers and quotes for payments.

I continued to play, also looking at special characters. In the end, at least for me, the solution was surprisingly simple:

 pd.read_csv(file_csv, delimiter=';', encoding='latin-1', engine='python')

Added coding helps to display special characters correctly, but changes in the game were a parameter of the engine. Honestly, I don’t understand why, but now it works.

Thanks again!

Panda read_csv always crashes when working with a small file

More articles: