Python string formatting + weird UTF-8 behavior

When printing a formatted line with a fixed length (for example, %20s), the width differs from the UTF-8 line in a normal line:

>>> str1="Adam Matan"
>>> str2="אדם מתן"
>>> print "X %20s X" % str1
X           Adam Matan X
>>> print "X %20s X" % str2
X        אדם מתן X

Note the difference:

X           Adam Matan X
X        אדם מתן X

Any ideas?

+3
source share
3 answers

You need to indicate that the second line is Unicode, placing ubefore the line:

>>> str1="Adam Matan"
>>> str2=u"אדם מתן"
>>> print "X %20s X" % str1
X           Adam Matan X
>>> print "X %20s X" % str2
X              אדם מתן X

This allows Python to know that it counts Unicode characters, not just bytes.

+7
source

Python 2 str, . , . UTF-8 . str2 , , , . ( repr print), , 20 ( !).

, . Python Unicode .

+3

Try as follows:

>>> str1="Adam Matan"
>>> str2=unicode("אדם מתן", "utf8")
>>> print "X %20s X" % str2
X              אדם מתן X
>>> print "X %20s X" % str1
X           Adam Matan X
+1
source

Source: https://habr.com/ru/post/1765625/


All Articles