Unicode Unicode I / O character output in python IDLE

Question

Unicode Unicode I / O character output in python IDLE

I have the following code:

# -*- coding: utf-8 -*- print "╔╤╤╦╤╤╦╤╤╗" print "╠╪╪╬╪╪╬╪╪╣" print "╟┼┼╫┼┼╫┼┼╢" print "╚╧╧╩╧╧╩╧╧╝" print "║" print "│"

and for some reason, only the third line (╚╧╧╩╧╧╩╧╧╝) is actually displayed correctly, the rest is an odd combination of characters. I assume this is due to some coding issues. The full output in IDLE is as follows:

 â•"â•¤â•¤â•¦â•¤â•¤â•¦â•¤â•¤â•— â• â•ªâ•ªâ•¬â•ªâ•ªâ•¬â•ªâ•ªâ•£ â•Ÿâ"¼â"¼â•«â"¼â"¼â•«â"¼â"¼â•¢ ╚╧╧╩╧╧╩╧╧╝ â•' â"‚

What causes this and how can I fix it? I only use a tablet (Surface Pro 3 with Win10) with a touch keyboard, so any solution with the least amount of input (especially typing strange characters) would be ideal, but obviously all help is appreciated.

0

python unicode character

Charliedebeadle Aug 11 '15 at 15:16

source share

2 answers

jfs · Answer 1 · 2015-08-11T18:22:31+0000

Mojibake indicates that text encoded in one encoding is shown in another incompatible encoding:

 #!/usr/bin/env python # -*- coding: utf-8 -*- print(u"╔╤╤╦╤╤╦╤╤╗".encode('utf-8').decode('cp1252')) #XXX: DON'T DO IT # -> â•"â•¤â•¤â•¦â•¤â•¤â•¦â•¤â•¤â•—

There are several places where incorrect encoding may be used.

# coding: utf-8 A coding declaration tells how to interpret non-ascii characters in the source code (for example, inside string literals). If print u"╔╤╤╦╤╤╦╤╤╗" works in your case, it means that the source code itself is decoded in Unicode correctly. For debugging, you can write a string using only ascii characters: u'\u2554\u2557' == u'╔╗' .

print "╔╤╤╦╤╤╦╤╤╗" (DON'T DO IT) prints bytes (in this case text is encoded using utf-8). IDLE itself works with Unicode (BMP). Bytes must be decoded into Unicode text before they are shown in IDLE. IDLE seems to use an ANSI cp1252 such as cp1252 ( locale.getpreferredencoding(False) ) to decode output bytes on Windows. Do not print text as bytes. It will fail in any environment that uses a character encoding different from your source code, for example, you will get ΓòöΓòù... mojibake if you run the code from a question in the Windows console that uses the OEM cp437 code page.

You must use Unicode for all the text in your program. Python 3 even prohibits non-ascii characters inside a bytes literal. You would get a SyntaxError there.

print(u'\u2554\u2557') may end with a UnicodeEncodeError if you run the code in the Windows console and an OEM code page such as cp437 cannot represent characters. To print arbitrary Unicode characters in the Windows console, use the win-unicode-console package . You do not need this if you are using IDLE.

Charliedebeadle · Answer 2 · 2015-08-11T15:53:02+0000

Putting u before the lines fixed the problem, as suggested by @FredLarson:

 print u"╔╤╤╦╤╤╦╤╤╗" print u"╠╪╪╬╪╪╬╪╪╣" print u"╟┼┼╫┼┼╫┼┼╢" print u"╚╧╧╩╧╧╩╧╧╝" print u"║" print u"│"

The exact reason is still unknown, as it seems to work on other systems, and it is strange that the third line worked fine.

Unicode Unicode I / O character output in python IDLE

More articles: