Efficient way to select the last index with the final value in a column from a Pandas DataFrame?

I am trying to find the latest index with a value that is not "NaN" relative to the current index. So, let's say I have a DataFrame with "NaN" values, for example:

       A       B       C
0    2.1     5.3     4.7
1    5.1     4.6     NaN
2    5.0     NaN     NaN
3    7.4     NaN     NaN
4    3.5     NaN     NaN
5    5.2     1.0     NaN
6    5.0     6.9     5.4
7    7.4     NaN     NaN
8    3.5     NaN     5.8

If I'm in index 4 now, I have the values:

       A       B       C
4    3.5     NaN     NaN

I want to know the last known value of 'B' relative to index 4, which is in the index 1:

       A       B       C
1    5.1   -> 4.6    NaN

I know that I can get a list of all indexes with NaN values ​​using something like:

indexes = df.index[df['B'].apply(np.isnan)]

But this seems inefficient in a large database. Is there tailonly the last way with respect to the current index?

+4
2

- , index NaN, B, ffill(), NaN s:

import pandas as pd
import numpy as np
df['Last_index_notnull'] = df.index.to_series().where(df.B.notnull(), np.nan).ffill()
df['Last_value_notnull'] = df.B.ffill()
df

enter image description here

4 , 4.6, index 1.

+5

,

last_valid_index
first_valid_index
B 4

df.B.ix[:4].last_valid_index()

1

pd.concat([df.ix[:i].apply(pd.Series.last_valid_index) for i in df.index],
          axis=1).T

enter image description here

+4

Source: https://habr.com/ru/post/1658142/


All Articles