If else function in pandas dataframe

I am trying to apply an if condition on a data framework, but I missed something (error: the truth value of the series is ambiguous. Use a.empty, a.bool (), a.item (), a.any () or a.all ( ).)

raw_data = {'age1': [23,45,21],'age2': [10,20,50]}
df = pd.DataFrame(raw_data, columns = ['age1','age2'])

def my_fun (var1,var2,var3):
if (df[var1]-df[var2])>0 :
    df[var3]=df[var1]-df[var2]
else:
    df[var3]=0
print(df[var3])

my_fun('age1','age2','diff')
+4
source share
3 answers

You can use numpy.where:

def my_fun (var1,var2,var3):
    df[var3]= np.where((df[var1]-df[var2])>0, df[var1]-df[var2], 0)
    return df

df1 = my_fun('age1','age2','diff')
print (df1)
   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

The error is best explained here .

Slowier's solution apply, where required axis=1to process data line by line:

def my_fun(x, var1, var2, var3):
    print (x)
    if (x[var1]-x[var2])>0 :
        x[var3]=x[var1]-x[var2]
    else:
        x[var3]=0
    return x    

print (df.apply(lambda x: my_fun(x, 'age1', 'age2','diff'), axis=1))
   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

Use is also possible loc, but sometimes data may be overwritten:

def my_fun(x, var1, var2, var3):
    print (x)
    mask = (x[var1]-x[var2])>0
    x.loc[mask, var3] = x[var1]-x[var2]
    x.loc[~mask, var3] = 0

    return x    

print (my_fun(df, 'age1', 'age2','diff'))
   age1  age2  diff
0    23    10  13.0
1    45    20  25.0
2    21    50   0.0
+5
source

you can use pandas.Series.where

df.assign(age3=(df.age1 - df.age2).where(df.age1 > df.age2, 0))

   age1  age2  age3
0    23    10    13
1    45    20    25
2    21    50     0

You can wrap this in a function

def my_fun(v1, v2):
    return v1.sub(v2).where(v1 > v2, 0)

df.assign(age3=my_fun(df.age1, df.age2))

   age1  age2  age3
0    23    10    13
1    45    20    25
2    21    50     0
+3

np.where pd.Series.where. , , , where . , , , , , . , .

Pandas, .

This answer shows the correct method for this.

Below is a snippet:

df.loc[df['age1'] - df['age2'] > 0]

.. which looks like this:

   age1  age2
0    23    10
1    45    20

Add an additional column to the original data framework for the values ​​that you want to keep after changing the slice:

df['diff'] = 0

Now change the slice:

df.loc[df['age1'] - df['age2'] > 0, 'diff'] = df['age1'] - df['age2']

.. and the result:

   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0
+1
source

Source: https://habr.com/ru/post/1674742/


All Articles