Find the days since the last pandas dataframe event

I have a pandas data frame:

df12 = pd.DataFrame({'group_ids':[1,1,1,2,2,2],'dates':['2016-04-01','2016-04-20','2016-04-28','2016-04-05','2016-04-20','2016-04-29'],'event_today_in_group':[1,0,1,1,1,0]})


   group_ids      dates  event_today_in_group
0          1 2016-04-01                     1
1          1 2016-04-20                     0
2          1 2016-04-28                     1
3          2 2016-04-05                     1
4          2 2016-04-20                     1
5          2 2016-04-29                     0

I would like to calculate an additional column that contains for each group_ids the number of days since the last event_today_in_group event.

 group_ids      dates  event_today_in_group  days_since_last_event
0          1 2016-04-01                     1                      0
1          1 2016-04-20                     0                     19
2          1 2016-04-28                     1                     27
3          2 2016-04-05                     1                      0
4          2 2016-04-20                     1                     15
5          2 2016-04-29                     0                      9
+4
source share
1 answer

As I mentioned earlier, this will give you a non-cumulative difference between the dates within each group:

df['days_since_last_event'] = df.groupby('group_ids')['dates'].diff().apply(lambda x: x.days)

To get the cumulative sum of this difference, depending on when it changes event_today_in_group, I suggest using the shiftprevious row to get the value, and then generating a cumulative sum, for example:

df['event_today_in_group'].shift().cumsum()

Conclusion:

0    NaN
1    1.0
2    1.0
3    2.0
4    3.0
5    4.0

, . , , groupby, :

df.loc[:, 'days_since_last_event'] = df.groupby(['group_ids', df['event_today_in_group'].shift().cumsum()])['days_since_last_event'].cumsum()

:

   group_ids      dates  event_today_in_group  days_since_last_event
0          1 2016-04-01                     1                    NaN
1          1 2016-04-20                     0                   19.0
2          1 2016-04-28                     1                   27.0
3          2 2016-04-05                     1                    NaN
4          2 2016-04-20                     1                   15.0
5          2 2016-04-29                     0                    9.0
+3

Source: https://habr.com/ru/post/1681155/


All Articles