Python string with space and no space at the end and immutability

I found out that in immutable classes, __new__ can return a cached reference to an existing object with the same value; this is what the int, str, and tuple types do for small values. This is one of the reasons why their __init__ does nothing.

So, cached objects will be reinitialized again and again. But how the following two fragments in behavior differ.

With a space at the end:

 >>> a = 'string ' >>> b = 'string ' >>> a is b False >>> 

Without space:

 >>> c = 'string' >>> d = 'string' >>> c is d True >>> 

Can someone please explain to me how space makes a difference?

+9
python string immutability python-internals
Jan 18 '14 at 11:03
source share
2 answers

This is the quirk of how the CPython implementation tries to cache string literals. String literals with the same content may refer to the same string object, but they do not need this. 'string' occurs automatically if 'string ' not because 'string' contains only characters allowed in the Python identifier. I do not know why they chose this criterion, but it is. The behavior may vary between versions or implementations of Python.

From the source code of CPython 2.7 stringobject.h , line 28:

Internal strings (ob_sstate) try to ensure that only one string object with a given value exists, so equality tests can be a single pointer comparison. This is usually limited to strings that look like Python identifiers, although the built-in intern () can be used to force the internment of any string.

You can see the code that does this in Objects/codeobject.c :

 /* Intern selected string constants */ for (i = PyTuple_Size(consts); --i >= 0; ) { PyObject *v = PyTuple_GetItem(consts, i); if (!PyString_Check(v)) continue; if (!all_name_chars((unsigned char *)PyString_AS_STRING(v))) continue; PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i)); } 

Also note that interning is a separate process from combining string literals using the Python bytecode compiler. If you let the compiler compile the assignments a and b together, for example. by placing them in a module or if True: you will find that a and b will be the same string.

+12
Jan 18 '14 at 11:13
source share

This behavior is incompatible, and, as others have said, depends on the version of Python being executed. For a deeper discussion, see this question .

If you want to make sure that the same object is being used, you can force the interning of strings appropriately intern :

trainee (...) intern (string) β†’ string

 ``Intern'' the given string. This enters the string in the (global) table of interned strings whose purpose is to speed up dictionary lookups. Return the string itself or the previously interned string object with the same value. 
 >>> a = 'string ' >>> b = 'string ' >>> id(a) == id(b) False >>> a = intern('string ') >>> b = intern('string ') >>> id(a) == id(b) True 

Note in Python3 you need to explicitly import intern from sys import intern .

+5
Jan 18 '14 at 11:20
source share



All Articles