Adding a Boolean column to the Panda Dataframe

I am learning pandas and stuck with this problem here.

I created a dataframe that tracks all users and the number of times they did something.

To better understand the problem, I created this example:

import pandas as pd data = [ {'username': 'me', 'bought_apples': 2, 'bought_pears': 0}, {'username': 'you', 'bought_apples': 1, 'bought_pears': 1} ] df = pd.DataFrame(data) df['bought_something'] = df['bought_apples'] > 0 or df['bought_pears'] > 0 

In the last row, I want to add a column that indicates that they generally bought something.

This error appears:

ValueError: The truth value of the series is ambiguous. Use the a.empty, a.bool (), a.item (), a.any (), or a.all () commands.

I understand the meaning of the ambiguity in the panda Series ( also explained here ), but I could not relate this to the problem.

Interesting that it works

 df['bought_something'] = df['bought_apples'] > 0 

Can anybody help me?

+6
source share
2 answers

You can call sum over a series of lines and compare if it is more than 0 :

 In [105]: df['bought_something'] = df[['bought_apples','bought_pears']].sum(axis=1) > 0 df Out[105]: bought_apples bought_pears username bought_something 0 2 0 me True 1 1 1 you True 

As for your initial attempt, the error message tells you that to compare a scalar with an array, it is ambiguous, if you want or logical conditions, then you need to use the bit-wise operator | and wrap the conditions in brackets due to operator precedence:

 In [111]: df['bought_something'] = ((df['bought_apples'] > 0) | (df['bought_pears'] > 0)) df Out[111]: bought_apples bought_pears username bought_something 0 2 0 me True 1 1 1 you True 
+11
source

The reason for this error is because you use 'or' to combine two logical vectors instead of a Boolean scalar. That is why he says it is ambiguous.

+2
source

Source: https://habr.com/ru/post/989339/


All Articles