How does python store strings so that the is operator works on literals?

In python

>>> a = 5 >>> a is 5 True 

but

 >>> a = 500 >>> a is 500 False 

This is because it stores low integers as a single address. But as soon as the numbers begin to be complex, each int gets its own unique address space. That makes sense to me.

The current implementation saves an array of integer objects for all integers from -5 to 256, when you create an int in this range, you actually just return a reference to an existing object.

So why does this not apply to strings? Are strings as complex as large integers (if not more)?

 >>> a = '1234567' >>> a is '1234567' True 

How does python efficiently use the same address for all string literals? It cannot contain an array of all possible strings, for example, for numbers.

+5
source share
2 answers

It does not store an array of all possible strings; instead, it has a hash table that points to the memory addresses of all currently declared strings indexed by the hash of the string.

for instance

when you say a = 'foo' , it first hashes the string foo and checks if the entry in the hash table exists. If yes, then the variable a now refers to this address.

If there is no entry in the table, python allocates memory to store the row, hashes foo and adds the entry to the table with the address of the allocated memory.

Cm:

0
source

This is an optimization method called interning. CPython recognizes equal values ​​of string constants and does not allocate additional memory for new instances, but simply points to the same one (puts it), giving the same id() .

You can play to confirm that only constants are processed (simple operations such as b recognized):

 # Two string constants a = "aaaa" b = "aa" + "aa" # Prevent interpreter from figuring out string constant c = "aaa" c += "a" print id(a) # 4509752320 print id(b) # 4509752320 print id(c) # 4509752176 !! 

However, you can manually bind a string to an existing one using intern() :

 c = intern(c) print id(a) # 4509752320 print id(b) # 4509752320 print id(c) # 4509752320 !! 

Other translators may do this differently. Since strings are immutable, changing one of them will not change the other.

+3
source

Source: https://habr.com/ru/post/1258107/


All Articles