What is the difference between these codes and what does the representation do?

1.

>>> s = u"4-12\u4e2a\u82f1\u6587\u5b57\u6bcd\u3001\u6570\u5b57\u548c\u4e0b\u5212\u7ebf"
>>> print s
4-12个英文字母、数字和下划线
>>> print repr(s)
u'4-12\u4e2a\u82f1\u6587\u5b57\u6bcd\u3001\u6570\u5b57\u548c\u4e0b\u5212\u7ebf'

2.

print repr("4-12个英文字母、数字和下划线")
'4-12\xb8\xf6\xd3\xa2\xce\xc4\xd7\xd6\xc4\xb8\xa1\xa2\xca\xfd\xd7\xd6\xba\xcd\xcf\xc2\xbb\xae\xcf\xdf'

1 and 2 are different, but the source line is the same, both are "4-12 个 英文 字母, 数字 和 下划线"

What exactly does the registry do?

same value:

>>> print '4-12个英文字母、数字和下划线'.decode('gb2312').encode('unicode-escape')
4-12\u4e2a\u82f1\u6587\u5b57\u6bcd\u3001\u6570\u5b57\u548c\u4e0b\u5212\u7ebf
+3
source share
4 answers

I'll take a hit on this, “repr” is a machine representation of an object, while “print” shows a human-readable representation of an object. There are built-in repr , ' str, and unicode " methods that can be used by programmers to implement various print representations of an object. Here is a simple example

class PrintObject(object):
    def __repr__(self):
        return 'repr'

    def __str__(self):
        return 'str'

    def __unicode__(self):
        return 'unicode'

, python , ,

Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from printobject import PrintObject
>>> printObj = PrintObject()
>>> printObj
>>> repr(printObj)
'repr'
>>> str(printObj)
'str'
>>> unicode(printObj)
u'unicode'

repr ',

>>> printObj
repr

str ',

>>> print(printObj)
str

unicode ', .

>>> print(u'%s' % printObj)
unicode

, .

+2

>>> help(repr)
Help on built-in function repr in module __builtin__:

repr(...)
    repr(object) -> string

    Return the canonical string representation of the object.
    For most object types, eval(repr(object)) == object.
+1

Python , , . repr() escape- Unicode.

, , repr() escape-, ( GB2312).

+1

. , escape-. .. '\ u4e2a' 20010 (0x4e2a - ), "个".

. 8- , , . , . , , - ASCII , escape- (..\xb8 - 184 ( 0xB8 )). (gb2312) [184, 246] ('\ xb8\xf6') unicode 0x4e2a. , , , , . unicode, , , :

>>> s=s.decode('gb2312')

In python3, this distinction between “characters” and “data” is made clearer, since the str object is renamed to “bytes,” and now unicode strings become only strings.

+1
source

Source: https://habr.com/ru/post/1728886/


All Articles