Reading unicodecsv from unicode string not working?

Question

Reading unicodecsv from unicode string not working?

I am having trouble reading a CSV string in unicode in python-unicodescv:

>>> import unicodecsv, StringIO
>>> f = StringIO.StringIO(u'é,é')
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/guy/test/.env/lib/python2.7/site-packages/unicodecsv/__init__.py", line 101, in next
    row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

I assume this is a problem with how I somehow convert my string in unicode to a StringIO file? The example on the gythub python-unicodecsv page works fine:

>>> import unicodecsv
>>> from cStringIO import StringIO
>>> f = StringIO()
>>> w = unicodecsv.writer(f, encoding='utf-8')
>>> w.writerow((u'é', u'ñ'))
>>> f.seek(0)
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
>>> print row[0], row[1]
é ñ

Trying my code using cStringIO fails because cStringIO cannot accept unicode (so why the example works, I don't know!)

>>> from cStringIO import StringIO
>>> f = StringIO(u'é')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

I need to accept a formatted input in UTF-8 CSV format from a text field form field, so it cannot just read from a file.

Any ideas?

+4

python unicode csv

Guy bowden Jan 31 '14 at 11:59

source share

1 answer

Martijn Pieters · Accepted Answer · 2014-01-31T12:03:39+0000

unicodecsv . unicode. , .

, cStringIO.StringIO , pure-python StringIO.StringIO unicode, .

, unicode , StringIO:

>>> import unicodecsv, StringIO, cStringIO
>>> f = StringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']
>>> f = cStringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']

Reading unicodecsv from unicode string not working?

More articles: