Solutions 1 - cumsum only on url column:
You need a groupby custom Series created by a cumsum boolean mask, but then the url column needs to aggregate with first . Then remove the url level with reset_index and the last reindex reordering reindex :
g = (df.url != df.url.shift()).cumsum() print (g) 0 1 1 2 2 2 3 3 4 4 5 5 6 6 Name: url, dtype: int32 g = (df.url != df.url.shift()).cumsum() #another solution with ne #g = df.url.ne(df.url.shift()).cumsum() print (df.groupby([df.ID,df.date,g], sort=False).agg({'active_seconds':'sum', 'url':'first'}) .reset_index(level='url', drop=True) .reset_index() .reindex(columns=df.columns)) ID url date active_seconds 0 111 vk.com 12.01.2016 5 1 111 facebook.com 12.01.2016 7 2 111 twitter.com 12.01.2016 12 3 222 vk.com 12.01.2016 8 4 222 twitter.com 12.01.2016 34 5 111 facebook.com 12.01.2016 5
g = (df.url != df.url.shift()).cumsum().rename('tmp') print (g) 0 1 1 2 2 2 3 3 4 4 5 5 6 6 Name: tmp, dtype: int32 print (df.groupby([df.ID, df.url, df.date, g], sort=False)['active_seconds'] .sum() .reset_index(level='tmp', drop=True) .reset_index()) ID url date active_seconds 0 111 vk.com 12.01.2016 5 1 111 facebook.com 12.01.2016 7 2 111 twitter.com 12.01.2016 12 3 222 vk.com 12.01.2016 8 4 222 twitter.com 12.01.2016 34 5 111 facebook.com 12.01.2016 5
Solutions 2 - cumsum by columns ID and url :
g = df[['ID','url']].ne(df[['ID','url']].shift()).cumsum() print (g) ID url 0 1 1 1 1 2 2 1 2 3 1 3 4 2 4 5 2 5 6 3 6 print (df.groupby([g.ID, df.date, g.url], sort=False) .agg({'active_seconds':'sum', 'url':'first'}) .reset_index(level='url', drop=True) .reset_index() .reindex(columns=df.columns)) ID url date active_seconds 0 1 vk.com 12.01.2016 5 1 1 facebook.com 12.01.2016 7 2 1 twitter.com 12.01.2016 12 3 2 vk.com 12.01.2016 8 4 2 twitter.com 12.01.2016 34 5 3 facebook.com 12.01.2016 5
And the solution is where to add the df.url column, but you need to rename columns in helper df :
g = df[['ID','url']].ne(df[['ID','url']].shift()).cumsum() g.columns = g.columns + '1' print (g) ID1 url1 0 1 1 1 1 2 2 1 2 3 1 3 4 2 4 5 2 5 6 3 6 print (df.groupby([df.ID, df.url, df.date, g.ID1, g.url1], sort=False)['active_seconds'] .sum() .reset_index(level=['ID1','url1'], drop=True) .reset_index()) ID url date active_seconds 0 111 vk.com 12.01.2016 5 1 111 facebook.com 12.01.2016 7 2 111 twitter.com 12.01.2016 12 3 222 vk.com 12.01.2016 8 4 222 twitter.com 12.01.2016 34 5 111 facebook.com 12.01.2016 5
Delay
Similar solutions, but pivot_table slower than groupby :
In [180]: %timeit (df.assign(g=df.ID.ne(df.ID.shift()).cumsum()).pivot_table('active_seconds', ['g', 'ID', 'url', 'date'], None, 'sum').reset_index([1, 2, 3]).reset_index(drop=True)) 100 loops, best of 3: 5.02 ms per loop In [181]: %timeit (df.groupby([df.ID, df.url, df.date, (df.url != df.url.shift()).cumsum().rename('tmp')], sort=False)['active_seconds'].sum().reset_index(level='tmp', drop=True).reset_index()) 100 loops, best of 3: 3.62 ms per loop