How to find the duration of an event for a Pandas time series

I identify the positive and negative periods in the following time series in the ['sign'] column. How can I get each duration 1 and -1, and then count the number of periods from 1 and -1?

So, if I have five consecutive "1s" and then three consecutive "-1s" followed by two "1s" in the ["sign"] column, the answer will be 5 days, 3 days and 2 days. Then counting. '1': 2 and '-1': 1.

import pandas_datareader.data as web
import datetime as dt
import numpy as np
import pandas as pd

end = dt.datetime(2016, 12, 31)
start = dt.date(end.year-15, end.month, end.day)

aapl = web.DataReader('AAPL', 'yahoo', start, end)['Adj Close']
aapl = pd.DataFrame(aapl)
aapl['ema'] = aapl.ewm(200).mean()
aapl['diff'] = (aapl['Adj Close'] / aapl['ema']) - 1
aapl['sign'] = np.sign(aapl['diff'])

UPDATE: I realized that it would take a separate calculation of the periods, where sign = '1' and sign = '-1' when it comes to duration. This should be done by descriptive statistics in periods "1" and "-1".

Pandas : 0.19.2

+4
2

diff() cumsum(), , groupby .

aapl.groupby((aapl.sign.diff() != 0).cumsum()).size()

, , diff, sign .

aapl.sign.iloc[(aapl.sign.diff() != 0).cumsum().drop_duplicates().index]
               .value_counts().to_dict()

def durs(df):
    diffs = (df.sign.diff() != 0).cumsum()
    cnts = df.sign.iloc[diffs.drop_duplicates().index].value_counts().to_dict()
    days = df.groupby(diffs).size()
    return days, cnts

Demo

>>> df
   sign
0     1
1     1
2     1
3     1
4     1
5    -1
6    -1
7    -1
8     1
9     1

>>> days, cnts = durs(df)

>>> days
sign
1    5
2    3
3    2
dtype: int64

>>> cnts
{-1: 1, 1: 2}

1- [5, 2] - , 1, .

>>> data = np.where(df.sign == 1)[0]

>>> np.diff(np.r_[0, np.where(np.diff(data) != 1)[0]+1, data.size])
array([5, 2])

, NumPy, . >

+3

-

def duration_count(a):
    idx = np.r_[[0],np.flatnonzero(a[1:] != a[:-1])+1,a.size]
    duration = np.diff(idx)
    count = {a[0]:(duration.size+1)//2, -a[0]:duration.size//2}
    return duration, count

-

In [43]: a = np.array([1,1,1,1,1,-1,-1,-1,1,1])

In [44]: duration_count(a)
Out[44]: (array([5, 3, 2]), {-1: 1, 1: 2})

In [45]: a = np.array([-1,-1,1,1,1,1,1,-1,-1,-1,1,1,-1,-1,-1,-1])

In [46]: duration_count(a)
Out[46]: (array([2, 5, 3, 2, 4]), {-1: 3, 1: 2})
+1

Source: https://habr.com/ru/post/1670339/


All Articles