Np.where multiple return values

Using pandas and numpy, I am trying to process a column in a data framework and want to create a new column with the values ​​related to it. Therefore, if the value 1 is present in column x, then in the new column it will be equal to a, for value 2 it will be b etc

I can do this for single conditions, i.e.

df['new_col'] = np.where(df['col_1'] == 1, a, n/a) 

And I can find an example of several conditions, i.e. if x = 3 or x = 4, the value should be a, but do not do something like if x = 3, the value should be a, and if x = 4, the value c.

I tried just running two lines of code, for example:

 df['new_col'] = np.where(df['col_1'] == 1, a, n/a) df['new_col'] = np.where(df['col_1'] == 2, b, n/a) 

But obviously, the second line is overwritten. Did I miss something important?

+5
source share
4 answers

I think you can use loc :

 df.loc[(df['col_1'] == 1, 'new_col')] = a df.loc[(df['col_1'] == 2, 'new_col')] = b 

Or:

 df['new_col'] = np.where(df['col_1'] == 1, a, np.where(df['col_1'] == 2, b, np.nan)) 
+7
source

I think numpy choose() is the best option for you.

 import numpy as np choices = 'abcde' N = 10 np.random.seed(0) data = np.random.randint(1, len(choices) + 1, size=N) print(data) print(np.choose(data - 1, choices)) 

Output:

 [5 1 4 4 4 2 4 3 5 1] ['e' 'a' 'd' 'd' 'd' 'b' 'd' 'c' 'e' 'a'] 
+1
source

you can define a dict with your desired conversions. Then scroll through the DataFrame column and fill it.

There may be more elegant ways, but this will work:

 # create a dummy DataFrame df = pd.DataFrame( np.random.randint(2, size=(6,4)), columns=['col_1', 'col_2', 'col_3', 'col_4'], index=range(6) ) # create a dict with your desired substitutions: swap_dict = { 0 : 'a', 1 : 'b', 999 : 'zzz', } # introduce new column and fill with swapped information: for i in df.index: df.loc[i, 'new_col'] = swap_dict[ df.loc[i, 'col_1'] ] print df 

returns something like:

  col_1 col_2 col_3 col_4 new_col 0 1 1 1 1 b 1 1 1 1 1 b 2 0 1 1 0 a 3 0 1 0 0 a 4 0 0 1 1 a 5 0 0 1 0 a 
0
source

Use pandas Series.map instead.

 import pandas as pd df = pd.DataFrame({'col_1' : [1,2,4,2]}) print(df) def ab_ify(v): if v == 1: return 'a' elif v == 2: return 'b' else: return None df['new_col'] = df['col_1'].map(ab_ify) print(df) # output: # # col_1 # 0 1 # 1 2 # 2 4 # 3 2 # col_1 new_col # 0 1 a # 1 2 b # 2 4 None # 3 2 b 
0
source

Source: https://habr.com/ru/post/1244222/


All Articles