Python interpreter string pool optimization

After looking at this question and its duplicate, the question remained for me.

I get what to do is and == , and why, if I run

 a = "ab" b = "ab" a == b 

I get True . The question here will be WHY :

 a = "ab" b = "ab" a is b # Returns True 

So, I did my research, and I found this one . The answer says that the Python interpreter uses a string pool. Therefore, if he sees that the two lines are the same, he assigns the same id to the new one for optimization.

So far, everything is in order and response. My real question is why this pool only happens for some rows. Here is an example:

 a = "ab" b = "ab" a is b # Returns True, as expected knowing Interpreter uses string pooling a = "a_b" b = "a_b" a is b # Returns True, again, as expected knowing Interpreter uses string pooling a = "ab" b = "ab" a is b # Returns False, why?? a = "ab" b = "ab" a is b # Returns False, WHY?? 

So, for some characters, it seems that the string pool is not working. I used Python 2.7.6 for these examples, so I thought it would be fixed in Python 3. But after you try to use the same examples in Python 3, the same results will appear.

Question: Why is the row optimizer not optimized for these examples? Wouldn't it be better for Python to optimize this as well?


Edit: If I run "ab" is "ab" , return True . The question is why using variables returns False for some characters, but True for others.

+6
source share
1 answer

Your question is a duplicate of the more general question, " When python chooses to put a string , the correct answer , which is that string interning is implementation specific .

The interpretation of strings in CPython 2.7.7 is described very well in this article: Python's internal string interpretations . The information in it allows you to explain your examples.

The reason the strings "ab" and "a_b" interned while "ab" and "ab" are not, is because the former look like python identifiers, and the latter don't work.

Naturally, interning each line will result in overhead. Therefore, the interpreter must decide whether to set the string. Since the identifier names used in the python program are embedded in the program bytecode as strings, strings like identifiers have a higher chance of using internment.

A brief excerpt from the above article:

The all_name_chars function excludes strings that are not composed of ascii letters, numbers, or underscores, that is, strings similar to identifiers:

 #define NAME_CHARS \ "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz" /* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */ static int all_name_chars(unsigned char *s) { static char ok_name_char[256]; static unsigned char *name_chars = (unsigned char *)NAME_CHARS; if (ok_name_char[*name_chars] == 0) { unsigned char *p; for (p = name_chars; *p; p++) ok_name_char[*p] = 1; } while (*s) { if (ok_name_char[*s++] == 0) return 0; } return 1; } 

Given all these explanations, we now understand why 'foo!' is 'foo!' 'foo!' is 'foo!' evaluates to False , while 'foo' is 'foo' evaluates to True .

+4
source

Source: https://habr.com/ru/post/1015138/


All Articles