Let's say I have a time series of data with a categorical variable and a value:
In [4]: df = pd.DataFrame(data={'category': np.random.choice(['A', 'B', 'C', 'D'], 11), 'value': np.random.rand(11)}, index=pd.date_range('2015-04-20','2015-04-30'))
In [5]: df
Out[5]:
category value
2015-04-20 D 0.220804
2015-04-21 A 0.992445
2015-04-22 A 0.743648
2015-04-23 B 0.337535
2015-04-24 B 0.747340
2015-04-25 B 0.839823
2015-04-26 D 0.292628
2015-04-27 D 0.906340
2015-04-28 B 0.244044
2015-04-29 A 0.070764
2015-04-30 D 0.132221
If I'm interested in strings with category A, filtering to isolate them is trivial. But what if I am interested in n lines up to category A? If n = 2, I would like to see something like:
In [5]: df[some boolean indexing]
Out[5]:
category value
2015-04-20 D 0.220804
2015-04-21 A 0.992445
2015-04-22 A 0.743648
2015-04-27 D 0.906340
2015-04-28 B 0.244044
2015-04-29 A 0.070764
Similarly, what if I am interested in n lines around category A? Again, if n = 2, I would like to see this:
In [5]: df[some other boolean indexing]
Out[5]:
category value
2015-04-20 D 0.220804
2015-04-21 A 0.992445
2015-04-22 A 0.743648
2015-04-23 B 0.337535
2015-04-24 B 0.747340
2015-04-27 D 0.906340
2015-04-28 B 0.244044
2015-04-29 A 0.070764
2015-04-30 D 0.132221
Thank!
source
share