I am wondering if there is an effective way to do this in pandas: given the data frame, what is the first line that is less than the given value? For example, given:
addr
0 4196656
1 4197034
2 4197075
3 4197082
4 4197134
Which first value is less than 4197080? I want it to return only the line number 4197075. The solution is to first filter at 4197080 and then take the last line, but it looks like an extremely slow O (N) operation (first creating a data frame and then taking its last line), whereas binary search will accept O (LogN).
df.addr[ df.addr < 4197080].tail(1)
I timed it, and the creation df.addr[ df.addr < 4197080]more or less takes the same thing as df.addr[ df.addr < 4197080].tail(1)strongly hinting that internally it first creates the whole df.
num = np.random.randint(0, 10**8, 10**6)
num.sort()
df = pd.DataFrame({'addr':num})
df = df.set_index('addr', drop=False)
df = df.sort_index()
:
%timeit df.addr[ df.addr < 57830391].tail(1)
100 loops, best of 3: 7.9 ms per loop
lt :
%timeit df.lt(57830391)[-1:]
1000 loops, best of 3: 853 µs per loop
, :
%timeit bisect(num, 57830391, 0, len(num))
100000 loops, best of 3: 6.53 µs per loop
?