I am using the python pythur module to download content from various web pages. Since I also wanted to support potential text in Unicode, I avoided the cStringIO.StringIO function, which, according to python docs : cStringIO is a faster version of StringIO
Unlike the StringIO module, this module cannot accept Unicode strings that cannot be encoded as simple ASCII strings.
... does not support unicode strings. In fact, he claims that he does not support unicode strings that cannot be converted to ASCII strings. Can someone clarify this to me? Which can and which cannot be transformed?
I tested the following code and it seems to work with unicode very well:
import pycurl import cStringIO downloadedContent = cStringIO.StringIO() curlHandle = pycurl.Curl() curlHandle.setopt(pycurl.WRITEFUNCTION, downloadedContent.write) curlHandle.setopt(pycurl.URL, 'http://www.ltg.ed.ac.uk/~richard/unicode-sample.html') curlHandle.perform() content = downloadedContent.getvalue() fileHandle = open('unicode-test.txt','w') for char in content: fileHandle.write(char)
And the file is spelled correctly. I can even print all the content in the console, all the characters are displayed well ... So I am puzzled where is cStringIO going? Is there a reason why I should not use it?
[Note: I am using Python 2.6 and should stick with this version]
source share