Pandas idiomatic custom fill path

Question

Pandas idiomatic custom fill path

I have time series data in the following format, where the value indicates the accumulated amount since the last record. What I want to do is a “distribution” that accumulated historical amounts containing NaN, so this entry:

s = pd.Series([0, 0, np.nan, np.nan, 75, np.nan, np.nan, np.nan, np.nan, 50], pd.date_range(start="Jan 1 2016", end="Jan 10 2016", freq='D')) 2016-01-01 0.0 2016-01-02 0.0 2016-01-03 NaN 2016-01-04 NaN 2016-01-05 75.0 2016-01-06 NaN 2016-01-07 NaN 2016-01-08 NaN 2016-01-09 NaN 2016-01-10 50.0

Gets this result:

 2016-01-01 0.0 2016-01-02 0.0 2016-01-03 25.0 2016-01-04 25.0 2016-01-05 25.0 2016-01-06 10.0 2016-01-07 10.0 2016-01-08 10.0 2016-01-09 10.0 2016-01-10 10.0

Is there an idiomatic way for Pandas to do this, and not just loop through the data? I tried various things including fillna , dropna , isnull , doing shift to check the next value, etc., but I don’t see how to put them together.

+5

python pandas

Kiv Dec 14 '16 at 20:17

source share

1 answer

Psidom · Accepted Answer · 2016-12-14T20:35:44+0000

This can work for each fragment of missing values, create a group variable with cumsum (from the end of the series), and then perform a grouped average operation on each fragment:

 s.groupby(s.notnull()[::-1].cumsum()[::-1]).transform(lambda g: g[-1]/g.size) #2016-01-01 0.0 #2016-01-02 0.0 #2016-01-03 25.0 #2016-01-04 25.0 #2016-01-05 25.0 #2016-01-06 10.0 #2016-01-07 10.0 #2016-01-08 10.0 #2016-01-09 10.0 #2016-01-10 10.0 #Freq: D, dtype: float64

Or another option:

 s.groupby(s.shift().notnull().cumsum()).transform(lambda g: g[-1]/g.size) #2016-01-01 0.0 #2016-01-02 0.0 #2016-01-03 25.0 #2016-01-04 25.0 #2016-01-05 25.0 #2016-01-06 10.0 #2016-01-07 10.0 #2016-01-08 10.0 #2016-01-09 10.0 #2016-01-10 10.0 #Freq: D, dtype: float64

Pandas idiomatic custom fill path

More articles: