This is only a partial answer.
Read in your data, the note should have combined 2 columns of date / time
In [75]: df = read_csv(StringIO(data),sep='\s+',skiprows=1,parse_dates=[[3,4],[5,6]],header=None) In [76]: df.columns = ['created','closed','idx','num','typ'] In [77]: df Out[77]: created closed idx num typ 0 2013-07-28 03:55:00 2013-07-28 11:01:37.346438 0 4237941 Unix 1 2013-07-28 04:59:00 2013-07-28 18:25:02.193182 1 4238041 Windows 2 2013-07-28 05:09:00 2013-07-28 23:11:12.003673 2 4238051 Windows 3 2013-07-28 05:10:00 2013-07-28 05:32:51.547251 3 4238291 Windows 4 2013-07-28 01:15:00 2013-07-28 10:09:20 4 4238321 Unix 5 2013-07-28 01:53:00 2013-07-28 17:42:56.192088 5 4238331 Unix 6 2013-07-28 02:03:00 2013-07-28 06:34:09.455042 6 4238561 Windows 7 2013-07-28 02:03:00 2013-07-28 20:54:47.306731 7 4238691 Windows 8 2013-07-28 03:23:00 2013-07-28 13:15:20.823505 8 4238811 Windows 9 2013-07-28 04:16:00 2013-07-28 23:51:55.561463 9 4238851 Windows 10 2013-07-28 04:26:00 2013-07-28 09:27:06.275342 10 4239011 Unix 11 2013-07-28 04:38:00 2013-07-28 07:55:34.416621 11 4239041 Windows 12 2013-07-28 08:15:00 2013-07-28 08:46:42.380739 12 4239131 Unix 13 2013-07-28 01:08:00 2013-07-28 15:37:12.266341 13 4239141 Windows In [78]: df.dtypes Out[78]: created datetime64[ns] closed datetime64[ns] idx int64 num int64 typ object dtype: object
For each even, put 1 where it is in the range (created-closed). Fill nano with 0.
In [82]: m = df.apply(lambda x: Series(1,index=np.arange(x['created'].hour,x['closed'].hour+1)),axis=1).fillna(0) In [81]: m Out[81]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 6 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 8 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 9 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
Attach it to the source dataset and set the index
In [83]: y = df [['num', 'typ']]. join (m) .set_index (['num', 'typ'])
In [84]: y Out[84]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 num typ 4237941 Unix 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 4238041 Windows 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 4238051 Windows 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4238291 Windows 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4238321 Unix 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 4238331 Unix 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 4238561 Windows 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4238691 Windows 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 4238811 Windows 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 4238851 Windows 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4239011 Unix 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4239041 Windows 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4239131 Unix 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4239141 Windows 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
At this point you can do the calculations
Open / Closed - direct edge detection. Carry Fwd is just m.where(m==1)