I have a dataframe that is grouped by id. There are many groups, and each group has a variable number of rows. The first three lines of all groups do not contain interesting data. I would like to "collapse" the first three lines in each group to form one line as follows:
'id', and 'type' will remain unchanged in a new line with "collapsed".
"grp_idx" will be renamed "0" when the first three rows are aggregated
col_1 will be the sum of the first three lines
col_2 will be the sum of the first three lines
The โflagโ in the โcollapsedโ line will be 0 if all values โโare 0 in the first three lines. 'flag' will be 1 if it is 1 in any of the first three lines. (A simple amount is sufficient for this logic, since the flag is set on only one line for all groups)
Here is an example of what a dataframe looks like:
import pandas as pd import numpy as np df = pd.DataFrame.from_items([ ('id', [283,283,283,283,283,283,283,756,756,756]), ('type', ['A','A','A','A','A','A','A','X','X','X']), ('grp_idx', [1,2,3,4,5,6,7,1,2,3]), ('col_1', [2,4,6,8,10,12,14,5,10,15]), ('col_2', [3,6,9,12,15,18,21,1,2,3]), ('flag', [0,0,0,0,0,0,1,0,0,1]), ]); print(df) id type grp_idx col_1 col_2 flag 0 283 A 1 2 3 0 1 283 A 2 4 6 0 2 283 A 3 6 9 0 3 283 A 4 8 12 0 4 283 A 5 10 15 0 5 283 A 6 12 18 0 6 283 A 7 14 21 1 7 756 X 1 5 1 0 8 756 X 2 10 2 0 9 756 X 3 15 3 1
After processing, I expect the data structure to look like this:
ID Type grp_idx col_1 col_2 flag 283 A 0 12 18 0 283 A 4 8 12 0 283 A 5 10 15 0 283 A 6 12 18 0 283 A 7 14 21 1 756 X 0 30 6 1
I am not sure how to proceed. I tried to play with
df.groupby ('id'). head (3) .sum ()
but it does not do what I need. Any help, suggestions, code snippets would be really appreciated.