Pandas update column with array

So, I am learning pandas and I have this problem.

Suppose I have a Dataframe:

ABC 1 x NaN 2 y NaN 3 x NaN 4 x NaN 5 y NaN 

I am trying to create this:

 ABC 1 x [1,3,4] 2 y [2,5] 3 x [1,3,4] 4 x [1,3,4] 5 y [2,5] 

Based on B.'s similarities

I have done this:

 teste = df.groupby(['B']) for name,group in teste: df.loc[df['B'] == name[0],'C'] = group['A'].tolist() 

And I realized that. Like column C based on column A.

 ABC 1 x 1 2 y 2 3 x 3 4 x 4 5 y 5 

Can someone explain to me why this is happening and the decision to do it the way I want? Thanks:)

+5
source share
4 answers

First, you can perform aggregation based on column B, and then join the original df on B :

 df # AB #0 1 x #1 2 y #2 3 x #3 4 x #4 5 y df.groupby('B').A.apply(list).rename('C').reset_index().merge(df) # BCA #0 x [1, 3, 4] 1 #1 x [1, 3, 4] 3 #2 x [1, 3, 4] 4 #3 y [2, 5] 2 #4 y [2, 5] 5 
+6
source

You can use transform to create lists.

 In [324]: df['C'] = df.groupby('B')['A'].transform(lambda x: [x.values]) In [325]: df Out[325]: ABC 0 1 x [1, 3, 4] 1 2 y [2, 5] 2 3 x [1, 3, 4] 3 4 x [1, 3, 4] 4 5 y [2, 5] 
+4
source

Sum is a creative thing!
Make A unique lists. Then do the conversion with sum .

 df.assign( C=pd.Series( df.A.values[:, None].tolist(), df.index ).groupby(df.B).transform('sum') ) ABC 0 1 x [1, 3, 4] 1 2 y [2, 5] 2 3 x [1, 3, 4] 3 4 x [1, 3, 4] 4 5 y [2, 5] 
+1
source
 test = df.groupby('B')['A'].apply(list) 
0
source

Source: https://habr.com/ru/post/1269992/


All Articles