Pandas temporary oversampling ending on a specific day

I suspect that many people working with timeseries data have already encountered this problem, and pandas does not seem to provide a simple solution (for now!):

Let's pretend that:

  • You have a time series of daily data with closed prices indexed by date (day).
  • Today is June 19th. Last Close data value is 18JUN.
  • You want to reprogram daily data in OHLC bars with a certain frequency (let M or 2M) end with 18JUN.

So, for M freq, the last point will be 19MAY-18JUN, the previous 19APR-18MAY and so on ...

ts.resample('M', how='ohlc')

will resample, but "M" is the period "end_of_month", so the result will give a full month for 2014-05 and a 2-week period for 2014-06, so your last score will not be a 'monthly bar'. This is not what we want!

With frequency 2M, given my samples, my test gives me the final touch, labeled 2014-07-31 (and the previous one labeled 2014-05-31), which is pretty misleading since there is no data on the JUL .... Estimated last The 2-month bar covers the last 2 weeks again.

The correct DatetimeIndex is easily created with:

pandas.date_range(end='2014-06-18', freq='2M', periods=300) + datetime.timedelta(days=18)
Documentation

(Pandas prefers to do the same through

pandas.date_range(end='2014-06-18', freq='2M', periods=300) + pandas.tseries.offsets.DateOffset(days=18)

but my tests show that this method, although more "pandaïc" is 2 times slower!)

In any case, we cannot apply the correct DatetimeIndex to ts.resample ().

, pandas dev ( Pandas) , , OHLC , ?

+4
1

, , /, , , , .

from pandas.tseries.offsets import (as_datetime, as_timestamp, apply_nat, 
                               DateOffset, relativedelta, datetime)
class MonthAnchor(DateOffset):
    """DateOffset Anchored to day in month

        Arguments:
        day_anchor: day to be anchored to
    """

    def __init__(self, n=1, **kwds):
        super(MonthAnchor, self).__init__(n)

        self.kwds = kwds
        self._dayanchor = self.kwds['day_anchor']

    @apply_nat
    def apply(self, other):
        n = self.n

        if other.day > self._dayanchor and n <= 0:  # then roll forward if n<=0
            n += 1
        elif other.day < self._dayanchor and n > 0:
            n -= 1

        other = as_datetime(other) + relativedelta(months=n)
        other = datetime(other.year, other.month, self._dayanchor)
        return as_timestamp(other)

    def onOffset(self, dt):
        return dt.day == self._dayanchor

    _prefix = ''

:

In [28]: df = pd.DataFrame(data=np.linspace(50, 100, 200), index=pd.date_range(end='2014-06-18', periods=200), columns=['value'])

In [29]: df.head()
Out[29]: 
                value
2013-12-01  50.000000
2013-12-02  50.251256
2013-12-03  50.502513
2013-12-04  50.753769
2013-12-05  51.005025


In [61]: month_offset = MonthAnchor(day_anchor = df.index[-1].day + 1)

In [62]: df.resample(month_offset, how='ohlc')
Out[62]: 
                value                                   
                 open        high        low       close
2013-11-19  50.000000   54.271357  50.000000   54.271357
2013-12-19  54.522613   62.060302  54.522613   62.060302
2014-01-19  62.311558   69.849246  62.311558   69.849246
2014-02-19  70.100503   76.884422  70.100503   76.884422
2014-03-19  77.135678   84.673367  77.135678   84.673367
2014-04-19  84.924623   92.211055  84.924623   92.211055
2014-05-19  92.462312  100.000000  92.462312  100.000000
+1

Source: https://habr.com/ru/post/1545209/


All Articles