Creating a column based on several conditions

I am a long-term SAS user trying to get into Pandas. I would like to set the column value based on various if conditions. I think I can do this using np.where nested commands, but I thought I'd check if there was a more elegant solution. For example, if I set the left bound and right border and want to return a column of string values, if x stays on the left, in the middle or on the right of these borders, what is the best way to do this? Basically, if x <lbound return "left", else if lbound <x <rbound return "middle", else if x> rbound returns "right".

df
   lbound   rbound  x
0   -1      1       0
1   5       7       1
2   0       1       2

One condition can be checked using np.where:

df['area'] = np.where(df['x']>df['rbound'],'right','somewhere else')

But not sure what to do. I want to check multiple if-else ifs on the same line.

The conclusion should be:

df
   lbound   rbound  x    area
0   -1      1       0    middle
1   5       7       1    left
2   0       1       2    right
+4
source share
2 answers

Option 1

You can use nested operators np.where. For instance:

df['area'] = np.where(df['x'] > df['rbound'], 'right', 
                      np.where(df['x'] < df['lbound'],
                               'left', 'somewhere else'))

Option 2

You can use .locaccessor to assign specific ranges. Please note that before use you will need to add a new column. We take this opportunity to set a default value, which can be overwritten later.

df['area'] = 'somewhere else'
df.loc[df['x'] > df['rbound'], 'area'] = 'right'
df.loc[df['x'] < df['lbound'], 'area'] = 'left'

Explanation

These are valid alternatives with comparable performance. Calculations are ventured in both cases. I prefer option 2 as it seems more readable. If there are a large number of nested criteria, it np.wheremay be more convenient.

+1
source

You can use numpy select instead of np.where

cond = [df['x'].between(df['lbound'], df['rbound']), (df['x'] < df['lbound']) , df['x'] > df['rbound'] ]
output = [ 'middle', 'left', 'right']

df['area'] = np.select(cond, output, default=np.nan)



    lbound  rbound  x   area
0   -1      1       0   middle
1   5       7       1   left
2   0       1       2   right
+1
source

Source: https://habr.com/ru/post/1694664/


All Articles