How to calculate movable idxmax

Question

How to calculate movable idxmax

consider pd.Series s

 import pandas as pd import numpy as np np.random.seed([3,1415]) s = pd.Series(np.random.randint(0, 10, 10), list('abcdefghij')) s a 0 b 2 c 7 d 3 e 8 f 7 g 0 h 6 i 8 j 6 dtype: int64

I want to get the index for the maximum value for skating window 3

 s.rolling(3).max() a NaN b NaN c 7.0 d 7.0 e 8.0 f 8.0 g 8.0 h 7.0 i 8.0 j 8.0 dtype: float64

I want to

 a None b None cc dc ee fe ge hf ii ji dtype: object

What I've done

 s.rolling(3).apply(np.argmax) a NaN b NaN c 2.0 d 1.0 e 2.0 f 1.0 g 0.0 h 0.0 i 2.0 j 1.0 dtype: float64

which is clearly not what I want

+5

python numpy pandas dataframe series

piRSquared Oct 18 '16 at 6:35

source share

5 answers

Here's the broadcasting approach -

 maxidx = (s.values[np.arange(s.size-3+1)[:,None] + np.arange(3)]).argmax(1) out = s.index[maxidx+np.arange(maxidx.size)]

This generates all the indexes corresponding to the rolling windows, indexes into the extracted version of the array with them, and thus gets the maximum indexes for each window. For more efficient indexing, we can use NumPy strides , for example:

 arr = s.values n = arr.strides[0] maxidx = np.lib.stride_tricks.as_strided(arr, \ shape=(s.size-3+1,3), strides=(n,n)).argmax(1)

+2

Divakar Oct 18 '16 at 7:58

source share

I used a generator

 def idxmax(s, w): i = 0 while i + w <= len(s): yield(s.iloc[i:i+w].idxmax()) i += 1 pd.Series(idxmax(s, 3), s.index[2:]) cc dc ee fe ge hf ii ji dtype: object

+2

piRSquared Oct 18 '16 at 8:16

source share

You can also simulate a rolling window by creating a DataFrame and using idxmax as follows:

 window_values = pd.DataFrame({0: s, 1: s.shift(), 2: s.shift(2)}) s.index[np.arange(len(s)) - window_values.idxmax(1)] Index(['a', 'b', 'c', 'c', 'e', 'e', 'e', 'f', 'i', 'i'], dtype='object', name=0)

As you can see, the first two terms are idxmax in relation to the initial windows of lengths 1 and 2, and not to zero values. This is not as effective as the accepted answer and probably not a good idea for large windows, but just another perspective.

+1

Joecondron Oct 18 '16 at 11:55

source share

Just talking about how I solved a similar problem that I had. I did not want to accurately determine the index, I wanted to know how long ago the maximum value occurred. But it can also be used to search for an index.

I mainly use the shift strategy, but I repeat several shifts with a custom length. This is probably slow, but works well enough for me.

 import pandas as pd length = 5 data = [1, 2, 3, 4, 5, 4, 3, 4, 5, 6, 7, 6, 5, 4, 5, 4, 3] df = pd.DataFrame(data, columns=['number']) df['helper_max'] = df.rolling(length).max() for i in range(length, -1, -1): # Set the column to what you want. You may grab the index # if you wish, I wanted number of rows since max happened df.loc[df['number'].shift(i) == df['helper_max'], 'n_rows_ago_since_max'] = i print(df)

Output:

  number helper_max n_rows_ago_since_max 0 1 NaN NaN 1 2 NaN NaN 2 3 NaN NaN 3 4 NaN NaN 4 5 5.0 0.0 5 4 5.0 1.0 6 3 5.0 2.0 7 4 5.0 3.0 8 5 5.0 0.0 9 6 6.0 0.0 10 7 7.0 0.0 11 6 7.0 1.0 12 5 7.0 2.0 13 4 7.0 3.0 14 5 7.0 4.0 15 4 6.0 4.0 16 3 5.0 2.0

0

Rkey Oct 11 '19 at 18:39

source share

Brenbarn · Accepted Answer · 2016-10-18T07:02:08+0000

There is no easy way to do this because the argument that is passed to the rolling application function is a simple numpy array, not a pandas series, so it does not know about the index. In addition, the rolling functions must return the result of the float, so they cannot directly return index values if they are not floating.

Here is one approach:

 >>> s.index[s.rolling(3).apply(np.argmax)[2:].astype(int)+np.arange(len(s)-2)] Index([u'c', u'c', u'e', u'e', u'e', u'f', u'i', u'i'], dtype='object')

The idea is to take the argmax values and align them with the series, adding a value indicating how far forward in our series. (That is, for the first argmax, we add zero because it gives us an index in the subsequence starting at index 0 in the original row, for the second argmax we add one because it gives us an index in the subsequence starting at index 1 in source series, etc.)

This gives the correct results, but does not include two “No” values at the beginning; you will need to add them back manually if you want.

There is an open pandas problem for adding rolling idxmax.

How to calculate movable idxmax

More articles: