I would do this with shift and cumsum (here's a simple example with numbers instead of time - but they will work exactly the same way):
In [11]: s = pd.Series([1., 1.1, 1.2, 2.7, 3.2, 3.8, 3.9]) In [12]: (s - s.shift(1) > 0.5).fillna(0).cumsum(skipna=False)
* the need for skipna = False seems like a mistake.
Then you can use this in groupby apply :
In [21]: df = pd.DataFrame([[1.1, 1.7, 2.5, 2.6, 2.7, 3.4], list('AAABBB')]).T In [22]: df.columns = ['time', 'ip'] In [23]: df Out[23]: time ip 0 1.1 A 1 1.7 A 2 2.5 A 3 2.6 B 4 2.7 B 5 3.4 B In [24]: g = df.groupby('ip') In [25]: df['session_number'] = g['time'].apply(lambda s: (s - s.shift(1) > 0.5).fillna(0).cumsum(skipna=False)) In [26]: df Out[26]: time ip session_number 0 1.1 A 0 1 1.7 A 1 2 2.5 A 2 3 2.6 B 0 4 2.7 B 0 5 3.4 B 1
Now you can group 'ip' and 'session_number' (and analyze each session).
source share