Encoding with pandas.read_csv when the file name has accents

Question

Encoding with pandas.read_csv when the file name has accents

I am trying to download CSV using pandas, but I ran into a problem if the file name has accents. This is clearly an encoding problem, but although read_csv allows you to set the encoding for the text inside the file, I cannot figure out how to encode the file name correctly.

 input_file = r'C:\...\Datasets\%s\Provinces\Points\%s.csv' % (country, province) self.locs = pandas.read_csv(input_file,sep=',',skipinitialspace=True)

CSV file - Anzoátegui.csv. When I get errors,

 input_file = 'C:\\...\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv

Error code:

 OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist

So maybe it converts my string to bytes? I also tried using io.StringIO(input_file) , which puts the correct file name in the column header on an empty DataFrame :

 Empty DataFrame Columns: [C:\PF2\QGIS Valmiera\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv] Index: []

Any ideas on how to download this file? Unfortunately, I cannot just exclude accents, since I need to interact with software that requires the correct name, and I have a ton of files to format (not just one). Thanks!

Edit: full error

 Traceback (most recent call last): File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_comm.py", line 891, in doIt result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec) File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_vars.py", line 486, in evaluateExpression result = eval(compiled, updated_globals, frame.f_locals) File "<string>", line 1, in <module> File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 404, in parser_f return _read(filepath_or_buffer, kwds) File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 486, in __init__ self._make_engine(self.engine) File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 594, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 952, in __init__ self._reader = _parser.TextReader(src, **kwds) File "parser.pyx", line 330, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3040) File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5387) OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist

+6

python python-3.x pandas encoding csv

khe Jun 04 '14 at 17:32

source share

1 answer

khe · Accepted Answer · 2014-06-05T15:00:08+0000

Well, I lost a little hellish hell, but it turns out that this problem was fixed in pandas 0.14.0. Install the updated version to get accented file names for proper import.

Comments on github .

Thanks for entering!

Encoding with pandas.read_csv when the file name has accents

More articles: