This is due to an invalid character (e.g. 0xe0) in the file
If you add the encoding parameter to the read_csv () call, you will see this trace stack instead of segfault
>>> df = pandas.read_csv("/tmp/lo_1986.csv", delimiter=";", encoding="utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 400, in parser_f return _read(filepath_or_buffer, kwds) File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 205, in _read return parser.read() File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 608, in read ret = self._engine.read(nrows) File "/Users/antkong/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1028, in read data = self._reader.read(nrows) File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas/parser.c:6745) File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:6964) File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas/parser.c:7780) File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:8793) File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:9484) File "parser.pyx", line 1026, in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:10642) File "parser.pyx", line 1051, in pandas.parser.TextReader._string_convert (pandas/parser.c:10905) File "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas/parser.c:15657) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: unexpected end of data
You can do preprocessing to remove these characters before asking pandas to read in the file
Attached image to highlight invalid characters in the file

source share