How can I select all rows of a DataFrame that are a certain distance from a given value in a specific column?

Here is an example DataFrame that I will use to better illustrate my question:

import pandas as pd

df = pd.DataFrame(pd.np.random.rand(30, 3), columns=tuple('ABC'))
df['event'] = pd.np.nan
df.loc[10, 'event'] = 'ping'
df.loc[20, 'event'] = 'ping'
df.loc[19, 'event'] = 'pong'

I need to create windows of n lines centered around each occurrence ping.

In other words, let ibe the index of the row containing pingthe column event. For everyone iI want to choose df.ix[i-n:i+n].

Thus, for n=3I would expect the following result:

             A          B          C event
7    0.8295863  0.2162861  0.4856461   NaN
8     0.156646  0.4730667  0.9968878   NaN
9    0.6709413  0.4796197  0.8747416   NaN
10  0.09942329   0.154008  0.5761598  ping
11   0.7168143   0.678207  0.7281105   NaN
12   0.8915475  0.8013187  0.9049722   NaN
13   0.9545411  0.4844835  0.1645746   NaN
17   0.9909208  0.1091025  0.6582635   NaN
18   0.2536326  0.4324749  0.8001643   NaN
19   0.4734659  0.5582809  0.1221296  pong
20   0.7230407  0.6695843  0.3902591  ping
21   0.3624909  0.2685049  0.5484445   NaN
22  0.05626284  0.6113877  0.9131929   NaN
23   0.8312294  0.5694373  0.4325798   NaN

[14 rows x 4 columns]

A few caveats:

  • I am looking for an iterative solution.
  • , pong, . ping.

?

+4
3
In [17]: n = 3

, , , . + - 3 ( / ). .

In [18]: indexers = np.unique(np.concatenate([ np.arange(max(i-n,0),min(i+n,len(df))) for i in df[df.event=='ping'].index ]))

In [19]: indexers
Out[19]: array([ 7,  8,  9, 10, 11, 12, 17, 18, 19, 20, 21, 22])

.

In [20]: df.iloc[indexers]
Out[20]: 
             A           B          C event
7   0.03348742  0.05735324  0.1220022   NaN
8    0.9567363   0.6539097  0.8409577   NaN
9    0.3115902   0.4955503  0.1749197   NaN
10   0.6883777   0.6185107  0.7933182  ping
11   0.5185129   0.6533616  0.1569159   NaN
12   0.1196976   0.9638604  0.7318006   NaN
17  0.02897615   0.1224485  0.5706852   NaN
18  0.02409971   0.4715463  0.4587161   NaN
19   0.9070592   0.3371241  0.9543977  pong
20   0.8533369   0.7549413  0.5334882  ping
21   0.9546738   0.8203931  0.8543028   NaN
22  0.05691086   0.2402766  0.3922318   NaN

, df.reset_index() ( , , ).

, , "" , . . df.convert_objects().

+6

- np.where. , .

ping = pd.Series(np.where(df.event == 'ping', True,
                          np.where(df.event.shift(1) == 'ping', True,
                                   np.where(df.event.shift(-1) == 'ping', True, False))), index=df.index)

df[ping]

- = 1 ?

: , . :

ping = pd.Series(np.where((df.event == 'ping') | (df.event.shift(1) == 'ping') |
                      (df.event.shift(-1) == 'ping'), True, False), index=df.index)
+1

may be:

>>> ts, n = df['event'] == 'ping', 3
>>> idx = ts.shift(n).fillna(False)  # +n rows
>>> for j in range(-n, n):  # -n to n-1 rows
...     idx |= ts.shift(j).fillna(False)
... 
>>> df[idx]
+1
source

Source: https://habr.com/ru/post/1547707/


All Articles