Question about uniqueness of string instance in python

I tried to figure out which integers python only instantiates once (from -6 to 256), and in this process I came across some kind of string behavior, I don't see the pattern. Sometimes the same lines are created in different ways to share the same identifier, sometimes not. This code:

A = "10000"
B = "10000"
C = "100" + "00"
D = "%i"%10000
E = str(10000)
F = str(10000)
G = str(100) + "00"
H = "0".join(("10","00"))

for obj in (A,B,C,D,E,F,G,H):
    print obj, id(obj), obj is A

prints:

10000 4959776 True
10000 4959776 True
10000 4959776 True
10000 4959776 True
10000 4959456 False
10000 4959488 False
10000 4959520 False
10000 4959680 False

- , , , , , "+" C, , . , C G , , .

, , A-D, ?

+3
4

Python , , , . , is -id , , .

: (, ), , , ( , ).

- , , , ( ) , , , .

, , -, , , , A-D , A ( E-F , , 100% - ROI, , - , ).

, A D, , , A B ( C D ).

, , / , .func_code.co_consts ( CPython ) - , , ( ), ( , , , .. ..).

( , , , CPython, , , IronPython, Jython PyPy ; p >

, , , .. .. Python . , ( , , , - , Python 4 - , !) Python, , , , , , , " ", ( ), , , ;-). - - , , ( , , - , , ..,).

+10

Python ; A, B, C, D ( Python , ).

str , str(whatever) , . E, F, G ( , ).

H, , , Python, , .

+4

, , , . , str join .

+1

in response to S. Lott’s suggestion for exploration of the byte code:

import dis
def moo():
    A = "10000"
    B = "10000"
    C = "100" + "00"
    D = "%i"%10000
    E = str(10000)
    F = str(10000)
    G = "1000"+str(0)
    H = "0".join(("10","00"))
    I = str("10000")

    for obj in (A,B,C,D,E,F,G,H, I):
        print obj, id(obj), obj is A
moo()
print dis.dis(moo)

gives:

10000 4968128 True
10000 4968128 True
10000 4968128 True
10000 4968128 True
10000 2840928 False
10000 2840896 False
10000 2840864 False
10000 2840832 False
10000 4968128 True
  4           0 LOAD_CONST               1 ('10000')
              3 STORE_FAST               0 (A)

  5           6 LOAD_CONST               1 ('10000')
              9 STORE_FAST               1 (B)

  6          12 LOAD_CONST              10 ('10000')
             15 STORE_FAST               2 (C)

  7          18 LOAD_CONST              11 ('10000')
             21 STORE_FAST               3 (D)

  8          24 LOAD_GLOBAL              0 (str)
             27 LOAD_CONST               5 (10000)
             30 CALL_FUNCTION            1
             33 STORE_FAST               4 (E)

  9          36 LOAD_GLOBAL              0 (str)
             39 LOAD_CONST               5 (10000)
             42 CALL_FUNCTION            1
             45 STORE_FAST               5 (F)

 10          48 LOAD_CONST               6 ('1000')
             51 LOAD_GLOBAL              0 (str)
             54 LOAD_CONST               7 (0)
             57 CALL_FUNCTION            1
             60 BINARY_ADD          
             61 STORE_FAST               6 (G)

 11          64 LOAD_CONST               8 ('0')
             67 LOAD_ATTR                1 (join)
             70 LOAD_CONST              12 (('10', '00'))
             73 CALL_FUNCTION            1
             76 STORE_FAST               7 (H)

 12          79 LOAD_GLOBAL              0 (str)
             82 LOAD_CONST               1 ('10000')
             85 CALL_FUNCTION            1
             88 STORE_FAST               8 (I)

 14          91 SETUP_LOOP              66 (to 160)
             94 LOAD_FAST                0 (A)
             97 LOAD_FAST                1 (B)
            100 LOAD_FAST                2 (C)
            103 LOAD_FAST                3 (D)
            106 LOAD_FAST                4 (E)
            109 LOAD_FAST                5 (F)
            112 LOAD_FAST                6 (G)
            115 LOAD_FAST                7 (H)
            118 LOAD_FAST                8 (I)
            121 BUILD_TUPLE              9
            124 GET_ITER            
        >>  125 FOR_ITER                31 (to 159)
            128 STORE_FAST               9 (obj)

 15         131 LOAD_FAST                9 (obj)
            134 PRINT_ITEM          
            135 LOAD_GLOBAL              2 (id)
            138 LOAD_FAST                9 (obj)
            141 CALL_FUNCTION            1
            144 PRINT_ITEM          
            145 LOAD_FAST                9 (obj)
            148 LOAD_FAST                0 (A)
            151 COMPARE_OP               8 (is)
            154 PRINT_ITEM          
            155 PRINT_NEWLINE       
            156 JUMP_ABSOLUTE          125
        >>  159 POP_BLOCK           
        >>  160 LOAD_CONST               0 (None)
            163 RETURN_VALUE        

so it would seem that the compiler understands that AD means the same thing, and therefore it saves memory by generating it only once (as suggested by Alex, Mac and Greg). (the added case Iseems to be just str (), implementing it, trying to make a string from a string and just passing it.)

Thanks to everyone, which is much clearer now.

+1
source

Source: https://habr.com/ru/post/1713004/


All Articles