Python support (2.6) cStringIO unicode?

I am using the python pythur module to download content from various web pages. Since I also wanted to support potential text in Unicode, I avoided the cStringIO.StringIO function, which, according to python docs : cStringIO is a faster version of StringIO

Unlike the StringIO module, this module cannot accept Unicode strings that cannot be encoded as simple ASCII strings.

... does not support unicode strings. In fact, he claims that he does not support unicode strings that cannot be converted to ASCII strings. Can someone clarify this to me? Which can and which cannot be transformed?

I tested the following code and it seems to work with unicode very well:

import pycurl import cStringIO downloadedContent = cStringIO.StringIO() curlHandle = pycurl.Curl() curlHandle.setopt(pycurl.WRITEFUNCTION, downloadedContent.write) curlHandle.setopt(pycurl.URL, 'http://www.ltg.ed.ac.uk/~richard/unicode-sample.html') curlHandle.perform() content = downloadedContent.getvalue() fileHandle = open('unicode-test.txt','w') for char in content: fileHandle.write(char) 

And the file is spelled correctly. I can even print all the content in the console, all the characters are displayed well ... So I am puzzled where is cStringIO going? Is there a reason why I should not use it?

[Note: I am using Python 2.6 and should stick with this version]

+4
source share
1 answer

Any text that uses only ASCII code points (byte values ​​00-7F, hexadecimal) can be converted to ASCII. Basically, any text that uses characters that are often not used in American English is not ASCII.

In your code example, you are not converting input to Unicode text; you consider it as unencrypted bytes. The requested test page is encoded in UTF-8, and you never decode it in Unicode.

If you had to decode the value into a Unicode string, you cannot save this string in the cStringIO object.

You might want to read the difference between Unicode encoding and text, such as ASCII and UTF-8. I can recommend:

+1
source

Source: https://habr.com/ru/post/1438662/


All Articles