Pandas idiomatic custom fill path

I have time series data in the following format, where the value indicates the accumulated amount since the last record. What I want to do is a “distribution” that accumulated historical amounts containing NaN, so this entry:

s = pd.Series([0, 0, np.nan, np.nan, 75, np.nan, np.nan, np.nan, np.nan, 50], pd.date_range(start="Jan 1 2016", end="Jan 10 2016", freq='D')) 2016-01-01 0.0 2016-01-02 0.0 2016-01-03 NaN 2016-01-04 NaN 2016-01-05 75.0 2016-01-06 NaN 2016-01-07 NaN 2016-01-08 NaN 2016-01-09 NaN 2016-01-10 50.0 

Gets this result:

 2016-01-01 0.0 2016-01-02 0.0 2016-01-03 25.0 2016-01-04 25.0 2016-01-05 25.0 2016-01-06 10.0 2016-01-07 10.0 2016-01-08 10.0 2016-01-09 10.0 2016-01-10 10.0 

Is there an idiomatic way for Pandas to do this, and not just loop through the data? I tried various things including fillna , dropna , isnull , doing shift to check the next value, etc., but I don’t see how to put them together.

+5
source share
1 answer

This can work for each fragment of missing values, create a group variable with cumsum (from the end of the series), and then perform a grouped average operation on each fragment:

 s.groupby(s.notnull()[::-1].cumsum()[::-1]).transform(lambda g: g[-1]/g.size) #2016-01-01 0.0 #2016-01-02 0.0 #2016-01-03 25.0 #2016-01-04 25.0 #2016-01-05 25.0 #2016-01-06 10.0 #2016-01-07 10.0 #2016-01-08 10.0 #2016-01-09 10.0 #2016-01-10 10.0 #Freq: D, dtype: float64 

Or another option:

 s.groupby(s.shift().notnull().cumsum()).transform(lambda g: g[-1]/g.size) #2016-01-01 0.0 #2016-01-02 0.0 #2016-01-03 25.0 #2016-01-04 25.0 #2016-01-05 25.0 #2016-01-06 10.0 #2016-01-07 10.0 #2016-01-08 10.0 #2016-01-09 10.0 #2016-01-10 10.0 #Freq: D, dtype: float64 
+5
source

Source: https://habr.com/ru/post/1261286/


All Articles