Python support (2.6) cStringIO unicode?

Question

Python support (2.6) cStringIO unicode?

I am using the python pythur module to download content from various web pages. Since I also wanted to support potential text in Unicode, I avoided the cStringIO.StringIO function, which, according to python docs : cStringIO is a faster version of StringIO

Unlike the StringIO module, this module cannot accept Unicode strings that cannot be encoded as simple ASCII strings.

... does not support unicode strings. In fact, he claims that he does not support unicode strings that cannot be converted to ASCII strings. Can someone clarify this to me? Which can and which cannot be transformed?

I tested the following code and it seems to work with unicode very well:

import pycurl import cStringIO downloadedContent = cStringIO.StringIO() curlHandle = pycurl.Curl() curlHandle.setopt(pycurl.WRITEFUNCTION, downloadedContent.write) curlHandle.setopt(pycurl.URL, 'http://www.ltg.ed.ac.uk/~richard/unicode-sample.html') curlHandle.perform() content = downloadedContent.getvalue() fileHandle = open('unicode-test.txt','w') for char in content: fileHandle.write(char)

And the file is spelled correctly. I can even print all the content in the console, all the characters are displayed well ... So I am puzzled where is cStringIO going? Is there a reason why I should not use it?

[Note: I am using Python 2.6 and should stick with this version]

+4

python stringio pycurl

Ivan Kovacevic Oct 9 '12 at 13:28

source share

1 answer

Martijn pieters · Accepted Answer · 2012-10-09T13:32:46+0000

Any text that uses only ASCII code points (byte values 00-7F, hexadecimal) can be converted to ASCII. Basically, any text that uses characters that are often not used in American English is not ASCII.

In your code example, you are not converting input to Unicode text; you consider it as unencrypted bytes. The requested test page is encoded in UTF-8, and you never decode it in Unicode.

If you had to decode the value into a Unicode string, you cannot save this string in the cStringIO object.

You might want to read the difference between Unicode encoding and text, such as ASCII and UTF-8. I can recommend:

Joel Spolsky Minimal Unicode Article
Python Unicode HOWTO .

Python support (2.6) cStringIO unicode?

More articles: