How can an unbound string in Python have an address in memory?

Can someone explain this to me? So I played with the id () command in python and came across this:

>>> id('cat') 5181152 >>> a = 'cat' >>> b = 'cat' >>> id(a) 5181152 >>> id(b) 5181152 

This makes sense to me, except for one part: the string "cat" has an address in memory before I assign it to a variable. I probably just don’t understand how memory addressing works, but can someone explain this to me or at least tell me that I should read memory addressing?

So, all is well and good, but it confused me:

 >>> a = a[0:2]+'t' >>> a 'cat' >>> id(a) 39964224 >>> id('cat') 5181152 

It seemed strange to me because 'cat' is a line with the address 5181152, but the new a has a different address. So if there are two lines of 'cat' in memory, why two addresses are not printed for id ('cat') ? My last thought was that concatenation had something to do with changing the address, so I tried this:

 >>> id(b[0:2]+'t') 39921024 >>> b = b[0:2]+'t' >>> b 'cat' >>> id(b) 40000896 

I would predict that the identifiers would be the same, but that was not the case. Thoughts?

+43
python memory-address
Aug 03 2018-11-11T00:
source share
5 answers

Python uses literal strings pretty quickly. The rules by which he does this are implementation dependent, but CPython uses two that I know of:

  • Lines containing only characters valid in Python identifiers are interned, which means they are stored in a large table and reused wherever they occur. Thus, no matter where you use "cat" , it always refers to the same string object.
  • String literals in the same code block are reused regardless of their contents and length. If you put the string literal of the entire Gettysburg address in the function twice, this is the same string object both times. In separate functions, they are different objects: def foo(): return "pack my box with five dozen liquor jugs" def bar(): return "pack my box with five dozen liquor jugs" assert foo() is bar() # AssertionError

Both optimizations are performed at compile time (that is, when bytecode is generated).

On the other hand, something like chr(99) + chr(97) + chr(116) is a string expression that evaluates to the string "cat" . In a dynamic language like Python, its value cannot be known at compile time ( chr() is a built-in function, but you could reassign it), so it is usually not interned. So its id() is different from "cat" . However, you can force the string to be interned using the intern() function. In this way:

 id(intern(chr(99) + chr(97) + chr(116))) == id("cat") # True 

As others have noted, interning is possible because strings are immutable. In other words, changing "cat" to "dog" not possible. You must generate a new string object, which means that there is no danger that other names pointing to the same string will be affected.

As well as on the sidelines, Python also converts expressions containing only constants (for example, "c" + "a" + "t" ) to constants at compile time, as the following demo shows. They will be optimized to point to the same string objects according to the rules above.

 >>> def foo(): "c" + "a" + "t" ... >>> from dis import dis; dis(foo) 1 0 LOAD_CONST 5 ('cat') 3 POP_TOP 4 LOAD_CONST 0 (None) 7 RETURN_VALUE 
+52
Aug 03 '11 at 19:29
source share
— -

'cat' has an address because you create it to pass its id() . You have not yet bound it to a name, but the object still exists.

Python caches and reuses short lines. But if you collect strings by concatenating, then code that looks for the cache and tries to reuse bypasses.

Note that the internal operation of the row cache is pure implementation detail and should not be relied upon.

+47
Aug 03 '11 at 19:11
source share

All values ​​should be somewhere in memory. This is why id('cat') creates a value. You call it a "nonexistent" string, but it clearly exists, it just hasn't been assigned a name yet.

Strings are immutable, so the interpreter can do smart things, such as making all instances of the 'cat' literal the same object, so id(a) and id(b) same.

Working with strings will create new strings. These may or may not be the same lines as previous lines with the same content.

Please note that all of these details are CPython implementation details, and they are subject to change at any time. You do not need to worry about these problems in real programs.

+17
Aug 03 '11 at 19:13
source share

Python variables are quite different from variables in other languages ​​(e.g. C).

In many other languages, a variable is a name for a location in memory. In these languages, different types of variables can refer to different types of locations, and several names can be assigned to the same place. For the most part, a given memory location may change data from time to time. There are also ways to access memory cells indirectly ( int *p will contain an address, and an integer in the memory cell at that address.) But the actual location of the variable reference cannot change; A variable is a location. Assigning variables in these languages ​​effectively "Look at the location of this variable and copy this data to this place"

Python does not work. In python, the actual objects fall into some memory location and the variables as tags for locations. Python manages stored values ​​separately from how it manages variables. Essentially, the assignment in python means "Look at the information for this variable, forget about where it already refers, and replace it with this new location." Data is not copied.

A common feature of langauges that work like python (unlike the first type that we talked about earlier) is that some kinds of objects are managed in a special way; identical values ​​are cached so that they do not take up additional memory, and therefore they can be compared very easily (if they have the same address, they are equal). This process is called interning ; All python string literals are interned (in addition to several other types), although there may not be dynamically created strings.

In your exact code, the semantic dialogue will look like this:

 # before anything, since 'cat' is a literal constant, add it to the intern cache >>> id('cat') # grab the constant 'cat' from the intern cache and look up # it address 5181152 >>> a = 'cat' # grab the constant 'cat' from the intern cache and # make the variable "a" point to it location >>> b = 'cat' # do the same thing with the variable "b" >>> id(a) # look up the object "a" currently points to, # then look up that object address 5181152 >>> id(b) # look up the object "b" currently points to, # then look up that object address 5181152 
+8
Aug 03 '11 at 19:39
source share

The code you submitted creates new lines as intermediate objects. These created lines end up with the same content as your originals. In the interim period of time, they do not exactly correspond to the original and should be stored at a specific address.

 >>> id('cat') 5181152 

As others have answered, by issuing these instructions, you invoke the Python virtual machine to create a string object containing the string "cat". This string object is cached and located at 5181152.

 >>> a = 'cat' >>> id(a) 5181152 

Again, a was assigned to reference this cached string object at 5181152 containing "cat".

 >>> a = a[0:2] >>> id(a) 27731511 

At this point, in my modified version of your program, you created two small string objects: 'cat' and 'ca' . 'cat' still exists in the cache. The string to which a refers is a different and possibly new string object containing the characters 'ca' .

 >>> a = a + 't' >>> id(a) 39964224 

Now you have created another new string object. This object is a concatenation of the string 'ca' at the address 27731511 and the string 't' . This concatenation matches the previous cached string 'cat' . Python does not automatically detect this case. As stated above, you can force a search using the intern() method.

I hope this explanation highlights the steps that address a changed.

There was no intermediate state in your code with a string assigned to 'ca' . The answer is still applicable as the Python interpreter generates a new string object to hold the intermediate result a[0:2] , assigning this intermediate result to a variable or not.

+1
Aug 04 '11 at 18:24
source share