Selecting strings before and after strings of interest in Pandas

Question

Selecting strings before and after strings of interest in Pandas

Let's say I have a time series of data with a categorical variable and a value:

In [4]: df = pd.DataFrame(data={'category': np.random.choice(['A', 'B', 'C', 'D'], 11), 'value': np.random.rand(11)}, index=pd.date_range('2015-04-20','2015-04-30'))

In [5]: df
Out[5]:
           category     value
2015-04-20        D  0.220804
2015-04-21        A  0.992445
2015-04-22        A  0.743648
2015-04-23        B  0.337535
2015-04-24        B  0.747340
2015-04-25        B  0.839823
2015-04-26        D  0.292628
2015-04-27        D  0.906340
2015-04-28        B  0.244044
2015-04-29        A  0.070764
2015-04-30        D  0.132221

If I'm interested in strings with category A, filtering to isolate them is trivial. But what if I am interested in n lines up to category A? If n = 2, I would like to see something like:

In [5]: df[some boolean indexing]
Out[5]:
           category     value
2015-04-20        D  0.220804
2015-04-21        A  0.992445
2015-04-22        A  0.743648
2015-04-27        D  0.906340
2015-04-28        B  0.244044
2015-04-29        A  0.070764

Similarly, what if I am interested in n lines around category A? Again, if n = 2, I would like to see this:

In [5]: df[some other boolean indexing]
Out[5]:
           category     value
2015-04-20        D  0.220804
2015-04-21        A  0.992445
2015-04-22        A  0.743648
2015-04-23        B  0.337535
2015-04-24        B  0.747340
2015-04-27        D  0.906340
2015-04-28        B  0.244044
2015-04-29        A  0.070764
2015-04-30        D  0.132221

Thank!

+4

python pandas indexing selection

lenderson Feb 09 '17 at 21:55

source share

2 answers

To answer your first question:

df[pd.concat([df.category.shift(-i)=='A' for i in range(n)], axis=1).any(axis=1)]

, , (, ) , .

+4

DyZ 09 . '17 22:01

Maxu · Accepted Answer · 2017-02-09T22:14:31+0000

n lines around category A:

In [223]: idx = df.index.get_indexer_for(df[df.category=='A'].index)

In [224]: n = 1

In [225]: df.iloc[np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(df)))
                                            for i in idx]))]
Out[225]:
           category     value
2015-04-20        D  0.220804
2015-04-21        A  0.992445
2015-04-22        A  0.743648
2015-04-23        B  0.337535
2015-04-28        B  0.244044
2015-04-29        A  0.070764
2015-04-30        D  0.132221

In [226]: n = 2

In [227]: df.iloc[np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(df)))
                                            for i in idx]))]
Out[227]:
           category     value
2015-04-20        D  0.220804
2015-04-21        A  0.992445
2015-04-22        A  0.743648
2015-04-23        B  0.337535
2015-04-24        B  0.747340
2015-04-27        D  0.906340
2015-04-28        B  0.244044
2015-04-29        A  0.070764
2015-04-30        D  0.132221

Selecting strings before and after strings of interest in Pandas

More articles: