How to combine rows with upper bound for column value?

Question

How to combine rows with upper bound for column value?

I have a pd.DataFrame that I would like to convert:

   id  values  days  time  value_per_day
0   1      15    15     1         1
1   1      20     5     2         4
2   1      12    12     3         1

I would like to combine them into equal buckets in 10 days. Since daysat time 1 it is greater than 10, this should spill into the next line, having the value/daysecond line on average 1st and 2nd.

Here is the resulting output, where (values, 0) = 15*(10/15) = 10and (values, 1) = (5+20)/2:

   id  values  days  value_per_day
0   1      10    10         1.0
1   1      25    10         2.5
2   1      10    10         1.0
3   1       2     2         1.0

I tried pd.Grouper:

df.set_index('days').groupby([pd.Grouper(freq='10D', label='right'), 'id']).agg({'values': 'mean'})

Out[146]:
            values
days    id        
5 days  1       16
15 days 1       10

But I am clearly using it incorrectly.

csv for convenience:

id,values,days,time  
1,10,15,1  
1,20,5,2  
1,12,12,3

+4

python pandas

GiantsLoveDeathMetal Feb 13 '18 at 19:47

source share

1 answer

Wen · Accepted Answer · 2018-02-13T22:07:14+0000

Note: this is a solution for calculating time costs.

newdf=df.reindex(df.index.repeat(df.days))
v=np.arange(sum(df.days))//10
dd=pd.DataFrame({'value_per_day': newdf.groupby(v).value_per_day.mean(),'days':np.bincount(v)})
dd
Out[102]: 
   days  value_per_day
0    10            1.0
1    10            2.5
2    10            1.0
3     2            1.0
dd.assign(value=dd.days*dd.value_per_day)
Out[103]: 
   days  value_per_day  value
0    10            1.0   10.0
1    10            2.5   25.0
2    10            1.0   10.0
3     2            1.0    2.0

, , df.groupby(id), for

How to combine rows with upper bound for column value?

More articles: