In Python, why do single dictionary string values pass "in" equality checks? (Interning Experiment)

Question

In Python, why do single dictionary string values pass "in" equality checks? (Interning Experiment)

I am creating a Python utility that will include conversion integers into dictionary strings, where many integers can be displayed on one string. In my opinion, Python puts short strings and most hardcoded strings by default, preserving the memory overhead as a result, storing the "canonical" version of the string in the table. I thought I could benefit from this by executing string values, even if the string interpretation is more built to optimize key hashing. I wrote a quick test that checks the validity of lines for long lines, first only the lines stored in the list, and then the lines stored in the dictionary as values. I was unexpectedly lucky:

import sys top = 10000 non1 = [] non2 = [] for i in range(top): s1 = '{:010d}'.format(i) s2 = '{:010d}'.format(i) non1.append(s1) non2.append(s2) same = True for i in range(top): same = same and (non1[i] is non2[i]) print("non: ", same) # prints False del non1[:] del non2[:] with1 = [] with2 = [] for i in range(top): s1 = sys.intern('{:010d}'.format(i)) s2 = sys.intern('{:010d}'.format(i)) with1.append(s1) with2.append(s2) same = True for i in range(top): same = same and (with1[i] is with2[i]) print("with: ", same) # prints True ############################### non_dict = {} non_dict[1] = "this is a long string" non_dict[2] = "this is another long string" non_dict[3] = "this is a long string" non_dict[4] = "this is another long string" with_dict = {} with_dict[1] = sys.intern("this is a long string") with_dict[2] = sys.intern("this is another long string") with_dict[3] = sys.intern("this is a long string") with_dict[4] = sys.intern("this is another long string") print("non: ", non_dict[1] is non_dict[3] and non_dict[2] is non_dict[4]) # prints True ??? print("with: ", with_dict[1] is with_dict[3] and with_dict[2] is with_dict[4]) # prints True

I thought that checks without a dictate would lead to a “false” listing, but I was clearly mistaken. Does anyone know what is going on, and can there be any interruptions in the translation in my case? I could have many more keys than one value if I combine data from several input texts, so I'm looking for a way to save memory space. (I may have to use a database, but that is beyond the scope of this question.) Thank you in advance!

+5

python string dictionary python-3.x string-interning

synchronizer Jan 01 '16 at 2:15

source share

1 answer

user2357112 · Accepted Answer · 2017-01-01 02:20

One of the optimizations performed by the bytecode compiler is similar, but different from interning, in that it will use the same object for equal constants in the same code block. String literals are here:

 non_dict = {} non_dict[1] = "this is a long string" non_dict[2] = "this is another long string" non_dict[3] = "this is a long string" non_dict[4] = "this is another long string"

are in the same code block, so equal strings are ultimately represented by the same string object.

In Python, why do single dictionary string values ​​pass "in" equality checks? (Interning Experiment)

More articles:

In Python, why do single dictionary string values pass "in" equality checks? (Interning Experiment)