Fillna: how to fill in values ​​over the next x days

I have a data block with multiple columns and is indexed by date. I would like to skip the missing values, but only for the next x days. This means that the missing value will not be supplemented if its difference in the index is more than x days with the previous missing value in this column.

I did something with the loop, but it is not very efficient. Is there a better and more elegant way to do this?

I point out exactly that the dates in my index are not evenly distributed, so the marginal argument will not work.

+4
source share
5 answers

You can use the limit fillna argument:

 df.fillna(method='ffill', limit=3) # ffill is equivalent to pad 

The same argument is available for the convenient functions ffill , bfill .

limit : int , defaults to None
Maximum clearance for shipment or backfill

If the dates are not evenly distributed, you can resample (by day) first:

 df.resample('D') 

See also the section for missing data in documents.

+5
source

This illustrates what I meant.

 In [20]: df = DataFrame(randn(10,2),columns=list('AB'),index=date_range('20130101',periods=3)+date_range('20130110',periods=3)+date_range('20130120',periods=4)) In [21]: df Out[21]: AB 2013-01-01 -0.176354 1.033962 2013-01-02 0.666911 -0.018723 2013-01-03 0.300097 1.552866 2013-01-10 0.581816 -1.188106 2013-01-11 -0.394817 -1.018765 2013-01-12 1.000461 -1.211131 2013-01-20 0.097940 1.225805 2013-01-21 -2.205975 -0.455641 2013-01-22 0.508865 -0.403321 2013-01-23 -0.726969 0.448002 In [22]: df.reindex(index=date_range('20130101','20130125')).fillna(limit=2,method='pad') Out[22]: AB 2013-01-01 -0.176354 1.033962 2013-01-02 0.666911 -0.018723 2013-01-03 0.300097 1.552866 2013-01-04 0.300097 1.552866 2013-01-05 0.300097 1.552866 2013-01-06 NaN NaN 2013-01-07 NaN NaN 2013-01-08 NaN NaN 2013-01-09 NaN NaN 2013-01-10 0.581816 -1.188106 2013-01-11 -0.394817 -1.018765 2013-01-12 1.000461 -1.211131 2013-01-13 1.000461 -1.211131 2013-01-14 1.000461 -1.211131 2013-01-15 NaN NaN 2013-01-16 NaN NaN 2013-01-17 NaN NaN 2013-01-18 NaN NaN 2013-01-19 NaN NaN 2013-01-20 0.097940 1.225805 2013-01-21 -2.205975 -0.455641 2013-01-22 0.508865 -0.403321 2013-01-23 -0.726969 0.448002 2013-01-24 -0.726969 0.448002 2013-01-25 -0.726969 0.448002 
+1
source

Actually, I was just thinking of a solution. It takes 3 lines of code:

1 / re-select the data frame in the second 2 / fillna with a limit of 3 / reindex my new dataframe with the index of the original

In terms of speed, I don’t understand how it will look, but it should be fine, I think most of the pandas functions are implemented in cython

0
source

In the spirit of Onyxx's answer, I solved the same problem:

  • Add a column to the dataframe for the index date, set to nan, where the data should be filled with nan.
  • Fill in the column and date column data
  • Install nans where the date of the bombarded index is too old.
0
source

I solved this by implementing a cython function that would do the work for the series. I just call this function for every column of my file system to do this.

0
source

Source: https://habr.com/ru/post/1485556/


All Articles