I have a dataframe in which I search for a group and then break the values โโinside the group into several columns.
For example: let's say I have the following data framework:
>>> import pandas as pd >>> import numpy as np >>> df=pd.DataFrame() >>> df['Group']=['A','C','B','A','C','C'] >>> df['ID']=[1,2,3,4,5,6] >>> df['Value']=np.random.randint(1,100,6) >>> df Group ID Value 0 A 1 66 1 C 2 2 2 B 3 98 3 A 4 90 4 C 5 85 5 C 6 38 >>>
I want to group the "Group" field, get the sum of the "Value" field and get new fields, each of which contains the values โโof the group identifier.
Currently, I can do this as follows, but I'm looking for a cleaner methodology:
First, I create a dataframe with a list of identifiers in each group.
>>> g=df.groupby('Group') >>> result=g.agg({'Value':np.sum, 'ID':lambda x:x.tolist()}) >>> result ID Value Group A [1, 4] 98 B [3] 76 C [2, 5, 6] 204 >>>
And then I use pd.Series to split them into columns, rename them and then join it.
>>> id_df=result.ID.apply(lambda x:pd.Series(x)) >>> id_cols=['ID'+str(x) for x in range(1,len(id_df.columns)+1)] >>> id_df.columns=id_cols >>> >>> result.join(id_df)[id_cols+['Value']] ID1 ID2 ID3 Value Group A 1 4 NaN 98 B 3 NaN NaN 76 C 2 5 6 204 >>>
Is there a way to do this without first creating a list of values?