Comparing a single dataframe value with the previous 10 in the same column

In the data frame, I would like to calculate how much of the previous 10 days, prices are higher than today's price. The result will look like this:

price   ct>prev10
50.00   
51.00   
52.00   
50.50   
51.00   
50.00   
50.50   
53.00   
52.00   
49.00   
51.00   3

I saw that this message was answered by DSM, but the requirement was different in that the basis for the comparison was a static number as opposed to the current line:

Reaching "countif" with pd.rolling_sum ()

Of course, I would like to do this without a loop through 1x1. Pretty dumb - well in advance for a consultation.

+3
source share
2 answers

rolling_apply . 5, , .

lambda ( ) , .

df = pd.DataFrame({'price': [50, 51, 52, 50.5, 51, 50, 50.5, 53, 52, 49, 51]})

window = 5  # Given that sample data only contains 11 values.
df['price_count'] = pd.rolling_apply(df.price, window, 
                                     lambda group: sum(group[:-1] > group[-1]))
>>> df
    price  price_count
0    50.0          NaN
1    51.0          NaN
2    52.0          NaN
3    50.5          NaN
4    51.0            1
5    50.0            4
6    50.5            2
7    53.0            0
8    52.0            1
9    49.0            4
10   51.0            2

0-4. , :

group = df.price[:window].values
>>> group
array([ 50. ,  51. ,  52. ,  50.5,  51. ])

:

>>> group[:-1] > group[-1]
array([False, False,  True, False], dtype=bool)

:

>>> sum(group[:-1] > group[-1])
1

, 4.

+4

vectoized NumPy, broadcasting -

import numpy as np
import pandas as pd

# Sample input dataframe
df = pd.DataFrame({'price': [50, 51, 52, 50.5, 51, 50, 50.5, 53, 52, 49, 51]})

# Convert to numpy array for counting purposes
A = np.array(df['price'])

W = 5 # Window size

# Initialize another column for storing counts
df['price_count'] = np.nan

# Get counts and store as a new column in dataframe
C = (A[np.arange(A.size-W+1)[:,None] + np.arange(W-1)] > A[W-1:][:,None]).sum(1)
df['price_count'][W-1:] = C

-

>>> df
    price
0    50.0
1    51.0
2    52.0
3    50.5
4    51.0
5    50.0
6    50.5
7    53.0
8    52.0
9    49.0
10   51.0
>>> A = np.array(df['price'])
>>> W = 5 # Window size
>>> df['price_count'] = np.nan
>>> 
>>> C=(A[np.arange(A.size-W+1)[:,None] + np.arange(W-1)] > A[W-1:][:,None]).sum(1)
>>> df['price_count'][W-1:] = C
>>> df
    price  price_count
0    50.0          NaN
1    51.0          NaN
2    52.0          NaN
3    50.5          NaN
4    51.0            1
5    50.0            4
6    50.5            2
7    53.0            0
8    52.0            1
9    49.0            4
10   51.0            2
+1

Source: https://habr.com/ru/post/1658999/


All Articles