Find the index of the last true value in pandas Series or DataFrame

Question

Find the index of the last true value in pandas Series or DataFrame

I am trying to find the index of the last True value in the pandas boolean Series. My current code looks something like this: Is there a faster or cleaner way to do this?

import numpy as np
import pandas as pd
import string

index = np.random.choice(list(string.ascii_lowercase), size=1000)
df = pd.DataFrame(np.random.randn(1000, 2), index=index)
s = pd.Series(np.random.choice([True, False], size=1000), index=index)

last_true_idx_s = s.index[s][-1]
last_true_idx_df = df[s].iloc[-1].name

+4

python pandas

user1507844 Dec 20 '15 at 18:30

source share

3 answers

argmax True. argmax :

In [11]: s[::-1].argmax()
Out[11]: 'e'

:

In [12]: s.tail()
Out[12]:
n     True
e     True
k    False
d    False
l    False
dtype: bool

+3

Andy Hayden 20 . '15 19:54

Use last_valid_index:

In [9]:
s.tail(10)

Out[9]:
h    False
w     True
h    False
r     True
q    False
b    False
p    False
e    False
q    False
d    False
dtype: bool

In [8]:
s[s==True].last_valid_index()

Out[8]:
'r'

+2

Edchum Dec 20 '15 at 19:51

source share

jezrael · Accepted Answer · 2015-12-20T20:08:58+0000

You can use idxmaxthat the same as argmax Andy Hayden's answer :

print s[::-1].idxmax()

Comparison:

These timings will be highly dependent on the size of s, as well as the number (and position) of Trues - thanks.

In [2]: %timeit s.index[s][-1]
The slowest run took 6.92 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 35 µs per loop

In [3]: %timeit s[::-1].argmax()
The slowest run took 6.67 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 126 µs per loop

In [4]: %timeit s[::-1].idxmax()
The slowest run took 6.55 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 127 µs per loop

In [5]: %timeit s[s==True].last_valid_index()
The slowest run took 8.10 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 261 µs per loop

In [6]: %timeit (s[s==True].index.tolist()[-1])
The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 239 µs per loop

In [7]: %timeit (s[s==True].index[-1])
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 227 µs per loop

EDIT:

Next solution:

print s[s==True].index[-1]

EDIT1: Solution

(s[s==True].index.tolist()[-1])

was removed.

Find the index of the last true value in pandas Series or DataFrame

More articles: