Python string identification: `is` and` in` instructions

I had some problems getting this to work:

# Shortened for brevity def _coerce_truth(word): TRUE_VALUES = ('true','1','yes') FALSE_VALUES = ('false','0','no') _word = word.lower().strip() print "t" in _word if _word in TRUE_VALUES: return True elif _word in FALSE_VALUES: return False 

I found:

 In [20]: "foo" is "Foo".lower() Out[20]: False In [21]: "foo" is "foo".lower() Out[21]: False In [22]: "foo" is "foo" Out[22]: True In [23]: "foo" is "foo".lower() Out[23]: False 

Why is this? I understand that identity is different from equality, but when is identity formed? Statement 22 should be False if due to the static nature of the lines id == eq. In this case, I am confused by statement 23.

Please explain and thanks in advance.

+6
source share
3 answers

Q. "When is identity formed?"

a. When the object is created.

What you see is actually a detail of the Cpython implementation - it caches small lines and reuses them to improve efficiency. Other interesting cases are:

 "foo" is "foo".strip() # True "foo" is "foo"[:] # True 

In the end, we see that the string literal "foo" was cached. Each time you type "foo" , you are referring to the same object in memory. However, some string methods will always choose to create new objects (for example, .lower() ), and some of them will be wise to reuse the input string if the method has not made any changes (for example, .strip() ).


One of the advantages of this is that string equality can be realized by comparing pointers (incredibly fast), followed by a comparison for each character if the comparison of pointers is false. If the pointer comparison is True, character comparison by character can be avoided.

+6
source

Regarding the relationship between is and in :

The __contains__ method (which is behind the in operator) for tuple and list when searching for a match, first checks the identifier, and if it does not check the match. This gives you reasonable results even with objects that are not compared to itself:

 >>> x = float("NaN") >>> t = (1, 2, x) >>> x in (t) True >>> any(x == e for e in t) # this might be suprising False 
+3
source

Suppose you have several ways to create objects with the string value "foo":

 def f(s): return s.lower() li=['foo', # 0 'foo'.lower(), # 1 'foo'.strip(), # 2 'foo', # 3 'f'+'o'*2, # 4 '{}'.format('foo'), # 5 'f'+'o'+'o', # 6 intern('foo'.lower()), # 7 'foo'.upper().lower(), # 8 f('foo'), # 9 'FOO'.lower(), # 10 'foo', # 11 format('foo'), # 12 '%s'%'foo', # 13 '%s%s'%('fo','o'), # 14 'f' 'oo' # 15 ] 

is only True if objects have the same id . You can check this and see which string objects were interned into the same immutable string at runtime:

 def cat(l): d={} for i,e in enumerate(l): k=id(e) d.setdefault(k,[]).append(i) return '\n'.join(str((k,d[k])) for k in sorted(d)) 

Print

 (4299781024, [0, 2, 3, 6, 7, 11, 12, 13, 15]) (4299781184, [5]) (4299781784, [1]) (4299781864, [4]) (4299781904, [9]) (4299782944, [8]) (4299783064, [10]) (4299783144, [14]) 

You can see that most solutions (or interned ones) refer to the same string objects, but some of them don't. It depends on the implementation.

You can make them the same string objects using the intern function:

 print cat([intern(s) for s in li]) 

Print

 (4299781024, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]) 
0
source

Source: https://habr.com/ru/post/954201/


All Articles