Pandas boolean and operator with and without brackets gives different results

I just noticed this:

df[df.condition1 & df.condition2] df[(df.condition1) & (df.condition2)] 

Why is the output of these two lines different?


I cannot share the exact data, but I will try to provide as many details as possible:

 df[df.col1 == False & df.col2.isnull()] # returns 33 rows and the rule `df.col2.isnull()` is not in effect df[(df.col1 == False) & (df.col2.isnull())] # returns 29 rows and both conditions are applied correctly 

Thanks to @jezrael and @ayhan, here is what happened, and let me use the example provided by @jezael:

 df = pd.DataFrame({'col1':[True, False, False, False], 'col2':[4, np.nan, np.nan, 1]}) print (df) col1 col2 0 True 4.0 1 False NaN 2 False NaN 3 False 1.0 

If we look at line 3:

  col1 col2 3 False 1.0 

and the way I wrote the condition:

 df.col1 == False & df.col2.isnull() # is equivalent to False == False & False 

Since the & sign has higher precedence than == , without parentheses, False == False & False equivalent:

 False == (False & False) print(False == (False & False)) # prints True 

In brackets:

 print((False == False) & False) # prints False 

I think it’s a little easier to illustrate this problem with numbers:

 print(5 == 5 & 1) # prints False, because 5 & 1 returns 1 and 5==1 returns False print(5 == (5 & 1)) # prints False, same reason as above print((5 == 5) & 1) # prints 1, because 5 == 5 returns True, and True & 1 returns 1 

So, lessons: always add brackets !!!

I want me to be able to split the answer points on @jezrael and @ayhan :(

+5
source share
2 answers

There is no difference between df[condition1 & condition2] and df[(condition1) & (condition2)] . The difference arises when writing an expression, and the & operator takes precedence:

 df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=list('abc')) df Out: abc 0 5 0 3 1 3 7 9 2 3 5 2 3 4 7 6 4 8 8 1 condition1 = df['a'] > 3 condition2 = df['b'] < 5 df[condition1 & condition2] Out: abc 0 5 0 3 df[(condition1) & (condition2)] Out: abc 0 5 0 3 

However, if you type it like this, you will see an error message:

 df[df['a'] > 3 & df['b'] < 5] Traceback (most recent call last): File "<ipython-input-7-9d4fd21246ca>", line 1, in <module> df[df['a'] > 3 & df['b'] < 5] File "/home/ayhan/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 892, in __nonzero__ .format(self.__class__.__name__)) ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 

This is because 3 & df['b'] is evaluated first (this corresponds to False & df.col2.isnull() in your example). Therefore, you need to group the conditions in parentheses:

 df[(df['a'] > 3) & (df['b'] < 5)] Out[8]: abc 0 5 0 3 
+9
source

You are right, this is the difference, and I think that there is a problem with the priority of the operators - check the docs :

 df = pd.DataFrame({'col1':[True, False, False, False], 'col2':[4, np.nan, np.nan, 1]}) print (df) col1 col2 0 True 4.0 1 False NaN 2 False NaN 3 False 1.0 # operator & precedence print (df[df.col1 == False & df.col2.isnull()]) col1 col2 1 False NaN 2 False NaN 3 False 1.0 # operator == precedence bacause in brackets print (df[(df.col1 == False) & (df.col2.isnull())]) col1 col2 1 False NaN 2 False NaN 

It seems I found it in docs - 6.16. Operator priority, where see & , has a higher priority like == :

 Operator Description lambda Lambda expression if – else Conditional expression or Boolean OR and Boolean AND not x Boolean NOT in, not in, is, is not, Comparisons, including membership tests <, <=, >, >=, !=, == and identity tests | Bitwise OR ^ Bitwise XOR & Bitwise AND (expressions...), [expressions...], Binding or tuple display, list display, {key: value...}, {expressions...} dictionary display, set display 
+1
source

Source: https://habr.com/ru/post/1264434/


All Articles