How to get indexes of DataFrame rows efficiently, where these rows meet certain cumulative criteria?

For example, I would like to get letters indicating the row where the period of at least two consecutive drops in another column begins.

Sample data:

   a  b
0  3  a
1  2  b
2  3  c
3  2  d
4  1  e
5  0  f
6 -1  g
7  3  h
8  1  i
9  0  j

An example solution with a simple loop:

import pandas as pd

df = pd.DataFrame({'a': [3,2,3,2,1,0,-1,3,1,0], 'b': list('abcdefghij')})

less = 0
l = []
prev_prev_row = df.iloc[0]
prev_row = df.iloc[1]
if prev_row['a'] < prev_prev_row['a']: less = 1
for i, row in df.iloc[2:len(df)].iterrows():
    if row['a'] < prev_row['a']:
        less = less + 1
    else:
        less = 0
    if less == 2:
        l.append(prev_prev_row['b'])
    prev_prev_row = prev_row
    prev_row = row

This gives a list of l:

['c', 'h']
+4
source share
2 answers

use rolling(2)in reverse

s = df.a[::-1].diff().gt(0).rolling(2).sum().eq(2)
df.b.loc[s & (s != s.shift(-1))]

2    c
7    h
Name: b, dtype: object

if you really need a list

df.b.loc[s & (s != s.shift(-1))].tolist()

['c', 'h']
+2
source

Here's one approach with some help from NumPyand Scipy-

from scipy.ndimage.morphology import binary_closing

arr = df.a.values
mask1 = np.hstack((False,arr[1:] < arr[:-1],False))
mask2 = mask1 & (~binary_closing(~mask1,[1,1]))
final_mask = mask2[1:] > mask2[:-1]
out = list(df.b[final_mask])
+3
source

Source: https://habr.com/ru/post/1661742/


All Articles