I have a Pandas dataframe:
test=pd.DataFrame(columns=['GroupID','Sample','SampleMeta','Value'])
test.loc[0,:]='1','S1','S1_meta',1
test.loc[1,:]='1','S1','S1_meta',1
test.loc[2,:]='2','S2','S2_meta',1
I want to (1) group by two columns ('GroupID' and 'Sample'), (2) sum 'Value' for each group and (3) store only unique values in 'SampleMeta' for each group. The desired result is displayed ('GroupID' and 'Sample' as an index):
SampleMeta Value
GroupID Sample
1 S1 S1_meta 2
2 S2 S2_meta 1
df.groupby () and the .sum () method get closer, but .sum () combines the same values in the Values column within the group. As a result, the value of "S1_meta" is duplicated.
g=test.groupby(['GroupID','Sample'])
print g.sum()
SampleMeta Value
GroupID Sample
1 S1 S1_metaS1_meta 2
2 S2 S2_meta 1
Is there a way to achieve the desired result using groupby () and related methods? Merging the summed “value” for each group with a separate “SampleMeta” DataFrame works, but there should be a more elegant solution.