Python string formatting + weird UTF-8 behavior

Question

Python string formatting + weird UTF-8 behavior

When printing a formatted line with a fixed length (for example, %20s), the width differs from the UTF-8 line in a normal line:

>>> str1="Adam Matan"
>>> str2="אדם מתן"
>>> print "X %20s X" % str1
X           Adam Matan X
>>> print "X %20s X" % str2
X        אדם מתן X

Note the difference:

X           Adam Matan X
X        אדם מתן X

Any ideas?

+3

python string utf-8

Adam matan 20 sept '10 at 13:35

source share

3 answers

Python 2 str, . , . UTF-8 . str2 , , , . ( repr print), , 20 ( !).

, . Python Unicode .

+3

lunaryorn 20 . '10 13:51

Try as follows:

>>> str1="Adam Matan"
>>> str2=unicode("אדם מתן", "utf8")
>>> print "X %20s X" % str2
X              אדם מתן X
>>> print "X %20s X" % str1
X           Adam Matan X

+1

Michał Kwiatkowski 20 sept '10 at 13:45

source share

tghw · Accepted Answer · 2010-09-20T13:46:34+0000

You need to indicate that the second line is Unicode, placing ubefore the line:

>>> str1="Adam Matan"
>>> str2=u"אדם מתן"
>>> print "X %20s X" % str1
X           Adam Matan X
>>> print "X %20s X" % str2
X              אדם מתן X

This allows Python to know that it counts Unicode characters, not just bytes.

Python string formatting + weird UTF-8 behavior

More articles: