How does python do string magic?

Question

How does python do string magic?

Comparison of strings confused me today: it seems like python reuses strings (which is wise to do since they are immutable). To verify this fact, I did the following:

>>> a = 'xxx' >>> b = 'xxx' >>> a == b True >>> a is b True >>> id(a) 140141339783816 >>> id(b) 140141339783816 >>> c = 'x' * 3 >>> id(c) 140141339783816 >>> d = ''.join(['x', 'x', 'x']) >>> id(d) 140141339704576

This is a bit surprising. Some questions:

Does python check all the contents of its row table when determining new rows?
Is there a line size limit?
How does this mechanism work (comparing string hashes?)
However, it is not used for all types of generated strings. What is the rule here?

+6

python string python-internals

dangonfast Sep 05 '14 at 4:39

source share

1 answer

dangonfast · Answer 1 · 2014-09-05T08:54:30+0000

As the question has some upvotes (although this is somewhat duplicate), I will answer here my original questions (thanks above):

Yes, python checks all the contents of the internal table: but only for some rows, mainly those that can also be used as identifiers. The idea is that the acceleration trick used to handle the identifier by the python interpreter (compiler?) Is also useful for general string processing. This process is called internment.
As far as I know, there are no restrictions on string size, but there are other rules for reusing strings (basically: they should look like python identifiers).
Yes, the table is just python python, and rows have a hash to search for.
It is used only for string literals and constant expressions. Basically for all the things that the python interpreter can output at compile time.

To clarify the last point, the following snippets evaluate the string 'xxx' in all cases, but they are treated differently with respect to internment.

This is a constant expression:

 'x' * 3

But this is not so:

 a = 'x' a * 3 # this is no constant expression, so no interning can be applied.

And this is not an expression:

 ''.join(['x', 'x', 'x']) # this is no expression (a function is called)

How does python do string magic?

More articles: