Fillna: how to fill in values over the next x days

Question

Fillna: how to fill in values over the next x days

I have a data block with multiple columns and is indexed by date. I would like to skip the missing values, but only for the next x days. This means that the missing value will not be supplemented if its difference in the index is more than x days with the previous missing value in this column.

I did something with the loop, but it is not very efficient. Is there a better and more elegant way to do this?

I point out exactly that the dates in my index are not evenly distributed, so the marginal argument will not work.

+4

pandas

Maxi Jun 11 '13 at 6:00

source share

5 answers

Andy hayden · Answer 1 · 2013-06-11T09:38:10+0000

You can use the limit fillna argument:

 df.fillna(method='ffill', limit=3) # ffill is equivalent to pad

The same argument is available for the convenient functions ffill , bfill .

limit : int , defaults to None
Maximum clearance for shipment or backfill

If the dates are not evenly distributed, you can resample (by day) first:

 df.resample('D')

See also the section for missing data in documents.

Jeff · Answer 2 · 2013-06-11T12:13:03+0000

This illustrates what I meant.

 In [20]: df = DataFrame(randn(10,2),columns=list('AB'),index=date_range('20130101',periods=3)+date_range('20130110',periods=3)+date_range('20130120',periods=4)) In [21]: df Out[21]: AB 2013-01-01 -0.176354 1.033962 2013-01-02 0.666911 -0.018723 2013-01-03 0.300097 1.552866 2013-01-10 0.581816 -1.188106 2013-01-11 -0.394817 -1.018765 2013-01-12 1.000461 -1.211131 2013-01-20 0.097940 1.225805 2013-01-21 -2.205975 -0.455641 2013-01-22 0.508865 -0.403321 2013-01-23 -0.726969 0.448002 In [22]: df.reindex(index=date_range('20130101','20130125')).fillna(limit=2,method='pad') Out[22]: AB 2013-01-01 -0.176354 1.033962 2013-01-02 0.666911 -0.018723 2013-01-03 0.300097 1.552866 2013-01-04 0.300097 1.552866 2013-01-05 0.300097 1.552866 2013-01-06 NaN NaN 2013-01-07 NaN NaN 2013-01-08 NaN NaN 2013-01-09 NaN NaN 2013-01-10 0.581816 -1.188106 2013-01-11 -0.394817 -1.018765 2013-01-12 1.000461 -1.211131 2013-01-13 1.000461 -1.211131 2013-01-14 1.000461 -1.211131 2013-01-15 NaN NaN 2013-01-16 NaN NaN 2013-01-17 NaN NaN 2013-01-18 NaN NaN 2013-01-19 NaN NaN 2013-01-20 0.097940 1.225805 2013-01-21 -2.205975 -0.455641 2013-01-22 0.508865 -0.403321 2013-01-23 -0.726969 0.448002 2013-01-24 -0.726969 0.448002 2013-01-25 -0.726969 0.448002

Maxi · Answer 3 · 2013-06-11T13:48:19+0000

Actually, I was just thinking of a solution. It takes 3 lines of code:

1 / re-select the data frame in the second 2 / fillna with a limit of 3 / reindex my new dataframe with the index of the original

In terms of speed, I don’t understand how it will look, but it should be fine, I think most of the pandas functions are implemented in cython

david.bew · Answer 4 · 2014-06-04T10:24:48+0000

In the spirit of Onyxx's answer, I solved the same problem:

Add a column to the dataframe for the index date, set to nan, where the data should be filled with nan.
Fill in the column and date column data
Install nans where the date of the bombarded index is too old.

Maxi · Answer 5 · 2014-06-05T14:45:18+0000

I solved this by implementing a cython function that would do the work for the series. I just call this function for every column of my file system to do this.

Fillna: how to fill in values ​​over the next x days

More articles:

Fillna: how to fill in values over the next x days