I have a dataframe of the following form
import pandas as pd
Out[1]:
df = pd.DataFrame({'id':[1,2,3,4,5],
'group':['A','A','A','B','B'],
'start':['2012-08-19','2012-08-22','2013-08-19','2012-08-19','2013-08-19'],
'end':['2012-08-28','2013-09-13','2013-08-19','2012-12-19','2014-08-19']})
id group start end
0 1 A 2012-08-19 2012-08-28
1 2 A 2012-08-22 2013-09-13
2 3 A 2013-08-19 2013-08-21
3 4 B 2012-08-19 2012-12-19
4 5 B 2013-08-19 2014-08-19
For a given row in my data frame, I would like to count the number of elements in the same group that have an overlapping time interval.
For example, in group A, id 2 varies from August 22, 2012 to September 13, 2013, and therefore overlaps between id 1 (from August 19, 2012 to August 28, 2012), and also from id 3 (August 19, 2013 to August 21, 2013) for qty 2.
Conversely, there are no matches between elements in group B
So, for my dataframe example above, I would like to create something like
Out[2]:
id group start end count
0 1 A 2012-08-19 2012-08-28 1
1 2 A 2012-08-22 2013-09-13 2
2 3 A 2013-08-19 2013-08-21 1
3 4 B 2012-08-19 2012-12-19 0
4 5 B 2013-08-19 2014-08-19 0
I could "convince" this, but I would like to know if there is a more efficient way for Pandas to do this.
Thank you in advance for your help.