How to get indexes of DataFrame rows efficiently, where these rows meet certain cumulative criteria?

Question

How to get indexes of DataFrame rows efficiently, where these rows meet certain cumulative criteria?

For example, I would like to get letters indicating the row where the period of at least two consecutive drops in another column begins.

Sample data:

An example solution with a simple loop:

import pandas as pd

df = pd.DataFrame({'a': [3,2,3,2,1,0,-1,3,1,0], 'b': list('abcdefghij')})

less = 0
l = []
prev_prev_row = df.iloc[0]
prev_row = df.iloc[1]
if prev_row['a'] < prev_prev_row['a']: less = 1
for i, row in df.iloc[2:len(df)].iterrows():
    if row['a'] < prev_row['a']:
        less = less + 1
    else:
        less = 0
    if less == 2:
        l.append(prev_prev_row['b'])
    prev_prev_row = prev_row
    prev_row = row

This gives a list of l:

['c', 'h']

+4

performance python vectorization pandas

user3927220 Nov 23 '16 at 18:03

source share

2 answers

Here's one approach with some help from NumPyand Scipy-

from scipy.ndimage.morphology import binary_closing

arr = df.a.values
mask1 = np.hstack((False,arr[1:] < arr[:-1],False))
mask2 = mask1 & (~binary_closing(~mask1,[1,1]))
final_mask = mask2[1:] > mask2[:-1]
out = list(df.b[final_mask])

+3

Divakar Nov 23 '16 at 18:29

source share

piRSquared · Accepted Answer · 2016-11-23T18:26:39+0000

use rolling(2)in reverse

s = df.a[::-1].diff().gt(0).rolling(2).sum().eq(2)
df.b.loc[s & (s != s.shift(-1))]

2    c
7    h
Name: b, dtype: object

if you really need a list

df.b.loc[s & (s != s.shift(-1))].tolist()

['c', 'h']

How to get indexes of DataFrame rows efficiently, where these rows meet certain cumulative criteria?

More articles: