Index of last occurrence max up to min.

The name may not be intuitive - let me give you an example. Say I have dfcreated using

a = np.array([[ 1. ,  0.9,  1. ],
              [ 0.9,  0.9,  1. ],
              [ 0.8,  1. ,  0.5],
              [ 1. ,  0.3,  0.2],
              [ 1. ,  0.2,  0.1],
              [ 0.9,  1. ,  1. ],
              [ 1. ,  0.9,  1. ],
              [ 0.6,  0.9,  0.7],
              [ 1. ,  0.9,  0.8],
              [ 1. ,  0.8,  0.9]])

idx = pd.date_range('2017', periods=a.shape[0])
df = pd.DataFrame(a, index=idx, columns=list('abc'))

I can get the index location of each corresponding minimum column using

df.idxmin()

Now , how can I get the location of the last occurrence of the maximum column size, to the minimum location?

Visually, I want to find the green max location below:

enter image description here

where max after the minimum occurrence are ignored.

I can do this with .apply, but can I do this with a mask / advanced indexing?

Desired Result:

a   2017-01-07
b   2017-01-03
c   2017-01-02
dtype: datetime64[ns]
+4
source share
3 answers

mask, idxmax .

df.mask((df == df.min()).cumsum().astype(bool))[::-1].idxmax()

a   2017-01-07
b   2017-01-03
c   2017-01-02
dtype: datetime64[ns]

.

df.min()

a    0.6
b    0.2
c    0.1
dtype: float64

i = df == df.min()
i

                a      b      c
2017-01-01  False  False  False
2017-01-02  False  False  False
2017-01-03  False  False  False
2017-01-04  False  False  False
2017-01-05  False   True   True
2017-01-06  False  False  False
2017-01-07  False  False  False
2017-01-08   True  False  False
2017-01-09  False  False  False
2017-01-10  False  False  False

!

j = df.mask(i).cumsum().astype(bool))
j

              a    b    c
2017-01-01  1.0  0.9  1.0
2017-01-02  0.9  0.9  1.0
2017-01-03  0.8  1.0  0.5
2017-01-04  1.0  0.3  0.2
2017-01-05  1.0  NaN  NaN
2017-01-06  0.9  NaN  NaN
2017-01-07  1.0  NaN  NaN
2017-01-08  NaN  NaN  NaN
2017-01-09  NaN  NaN  NaN
2017-01-10  NaN  NaN  NaN

, idxmax.

j[::-1].idxmax()

a   2017-01-07
b   2017-01-03
c   2017-01-02
dtype: datetime64[ns]
+5

masking -

>>> a = df.values
>>> mask = a.argmin(0) > np.arange(a.shape[0])[:,None]
>>> idx = a.shape[0] - (a*mask)[::-1].argmax(0) - 1
>>> df.index[idx]
DatetimeIndex(['2017-01-07', '2017-01-03', '2017-01-02'], dtype='datetime64[ns]', freq=None)

masking, NaN, np.nanargmax -

a = df.values
min_idx = a.argmin(0)
mask = min_idx < np.arange(a.shape[0])[:,None]
a[mask] = np.nan
idx = a.shape[0]-np.nanargmax(a[::-1],axis=0) - 1
out = df.index[idx]
+4

Using last_valid_index

df[df==df.min()]=0

(df.mask((df.cumprod()==0)|(df!=df.max()))).apply(lambda x : x.last_valid_index())
Out[583]:
a   2017-01-07
b   2017-01-03
c   2017-01-02
dtype: datetime64[ns]
+1
source

Source: https://habr.com/ru/post/1691058/


All Articles