Pandas: Keep only every row with accumulated change threshold?

I am interested in extracting rows where the column value either grew cumulatively, at least 5, or fell cumulatively, at least 5, and then get signs of these cumulative changes up_or_down.

For example, let's say I want to apply this to a column yin the following:

df = pd.DataFrame({'x': range(16), 'y': [1,10,14,12,13,9,4,2,6,7,10,11,16,17,14,11]})

It should turn out:

x   y        # up_or_down
1   10       # +1
6   4        # -1
10  10       # +1
12  16       # +1
15  11       # -1

My dataframe is quite large, so I was hoping there was a good vectorized way to do this initially using the pandas API, rather than scrolling it using iterrows().

+4
source share
2 answers

This is the core of the solution.

def big_diff(y):
    val = y.values
    r = val[0]
    for i, x in enumerate(val):
        d = r - x
        if abs(d) >= 5:
            yield i, 1 if d < 0 else -1
            r = x

Then you can do something like this

slc = np.array(list(big_diff(df.y)))
df_slcd = pd.DataFrame(df.values[slc[:, 0]], df.index[slc[:, 0]], df.columns)
signs = pd.Series(slc[:, 1], df.index[slc[:, 0]], name='up_or_down')

df_slcd

enter image description here

signs

1     1
6    -1
10    1
12    1
15   -1
Name: up_or_down, dtype: int64

pd.concat([df_slcd, signs], axis=1)

enter image description here

+2

pandas: n- , +/- 5 n-1, n-1-, n-2 . , , , , . ad-hoc.

+1

Source: https://habr.com/ru/post/1654245/


All Articles