How to calculate movable idxmax

consider pd.Series s

 import pandas as pd import numpy as np np.random.seed([3,1415]) s = pd.Series(np.random.randint(0, 10, 10), list('abcdefghij')) s a 0 b 2 c 7 d 3 e 8 f 7 g 0 h 6 i 8 j 6 dtype: int64 

I want to get the index for the maximum value for skating window 3

 s.rolling(3).max() a NaN b NaN c 7.0 d 7.0 e 8.0 f 8.0 g 8.0 h 7.0 i 8.0 j 8.0 dtype: float64 

I want to

 a None b None cc dc ee fe ge hf ii ji dtype: object 

What I've done

 s.rolling(3).apply(np.argmax) a NaN b NaN c 2.0 d 1.0 e 2.0 f 1.0 g 0.0 h 0.0 i 2.0 j 1.0 dtype: float64 

which is clearly not what I want

+5
source share
5 answers

There is no easy way to do this because the argument that is passed to the rolling application function is a simple numpy array, not a pandas series, so it does not know about the index. In addition, the rolling functions must return the result of the float, so they cannot directly return index values ​​if they are not floating.

Here is one approach:

 >>> s.index[s.rolling(3).apply(np.argmax)[2:].astype(int)+np.arange(len(s)-2)] Index([u'c', u'c', u'e', u'e', u'e', u'f', u'i', u'i'], dtype='object') 

The idea is to take the argmax values ​​and align them with the series, adding a value indicating how far forward in our series. (That is, for the first argmax, we add zero because it gives us an index in the subsequence starting at index 0 in the original row, for the second argmax we add one because it gives us an index in the subsequence starting at index 1 in source series, etc.)

This gives the correct results, but does not include two β€œNo” values ​​at the beginning; you will need to add them back manually if you want.

There is an open pandas problem for adding rolling idxmax.

+10
source

Here's the broadcasting approach -

 maxidx = (s.values[np.arange(s.size-3+1)[:,None] + np.arange(3)]).argmax(1) out = s.index[maxidx+np.arange(maxidx.size)] 

This generates all the indexes corresponding to the rolling windows, indexes into the extracted version of the array with them, and thus gets the maximum indexes for each window. For more efficient indexing, we can use NumPy strides , for example:

 arr = s.values n = arr.strides[0] maxidx = np.lib.stride_tricks.as_strided(arr, \ shape=(s.size-3+1,3), strides=(n,n)).argmax(1) 
+2
source

I used a generator

 def idxmax(s, w): i = 0 while i + w <= len(s): yield(s.iloc[i:i+w].idxmax()) i += 1 pd.Series(idxmax(s, 3), s.index[2:]) cc dc ee fe ge hf ii ji dtype: object 
+2
source

You can also simulate a rolling window by creating a DataFrame and using idxmax as follows:

 window_values = pd.DataFrame({0: s, 1: s.shift(), 2: s.shift(2)}) s.index[np.arange(len(s)) - window_values.idxmax(1)] Index(['a', 'b', 'c', 'c', 'e', 'e', 'e', 'f', 'i', 'i'], dtype='object', name=0) 

As you can see, the first two terms are idxmax in relation to the initial windows of lengths 1 and 2, and not to zero values. This is not as effective as the accepted answer and probably not a good idea for large windows, but just another perspective.

+1
source

Just talking about how I solved a similar problem that I had. I did not want to accurately determine the index, I wanted to know how long ago the maximum value occurred. But it can also be used to search for an index.

I mainly use the shift strategy, but I repeat several shifts with a custom length. This is probably slow, but works well enough for me.

 import pandas as pd length = 5 data = [1, 2, 3, 4, 5, 4, 3, 4, 5, 6, 7, 6, 5, 4, 5, 4, 3] df = pd.DataFrame(data, columns=['number']) df['helper_max'] = df.rolling(length).max() for i in range(length, -1, -1): # Set the column to what you want. You may grab the index # if you wish, I wanted number of rows since max happened df.loc[df['number'].shift(i) == df['helper_max'], 'n_rows_ago_since_max'] = i print(df) 

Output:

  number helper_max n_rows_ago_since_max 0 1 NaN NaN 1 2 NaN NaN 2 3 NaN NaN 3 4 NaN NaN 4 5 5.0 0.0 5 4 5.0 1.0 6 3 5.0 2.0 7 4 5.0 3.0 8 5 5.0 0.0 9 6 6.0 0.0 10 7 7.0 0.0 11 6 7.0 1.0 12 5 7.0 2.0 13 4 7.0 3.0 14 5 7.0 4.0 15 4 6.0 4.0 16 3 5.0 2.0 
0
source

Source: https://habr.com/ru/post/1258383/


All Articles