How to calculate volatility (standard deviation) in the skate window in Pandas

I have a β€œSer” time series and I want to calculate volatility (standard deviations) with a rolling window. My current code does this correctly in this form:

w=10 for timestep in range(length): subSer=Ser[timestep:timestep+w] mean_i=np.mean(subSer) vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5 volList.append(w_i) 

This seems very inefficient to me. Does Pandas have built-in functions to do something like this?

+7
source share
3 answers

It looks like you are looking for Series.rolling . You can apply std calculations to the resulting object:

 roller = Ser.rolling(w) volList = roller.std(ddof=0) 

If you do not plan to use the sliding window object again, you can write a single-line line:

 volList = Ser.rolling(w).std(ddof=0) 

Keep in mind that ddof=0 necessary in this case, because the standard deviation is normalized by len(Ser)-ddof , and that ddof defaults to ddof 1 in pandas.

+12
source

Typically, [financial type] people indicate volatility in annual terms of percentage changes in price.

Assuming you have daily prices in a df data frame, and there are 252 trading days a year, perhaps something like the following:

df.pct_change().rolling(window_size).std()*(252**0.5)

+4
source

Here's one NumPy approach -

 # From http://stackoverflow.com/a/14314054/3293881 by @Jaime def moving_average(a, n=3) : ret = np.cumsum(a, dtype=float) ret[n:] = ret[n:] - ret[:-n] return ret[n - 1:] / n # From http://stackoverflow.com/a/40085052/3293881 def strided_app(a, L, S=1 ): # Window len = L, Stride len/stepsize = S nrows = ((a.size-L)//S)+1 n = a.strides[0] return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n)) def rolling_meansqdiff_numpy(a, w): A = strided_app(a, w) B = moving_average(a,w) subs = AB[:,None] sums = np.einsum('ij,ij->i',subs,subs) return (sums/w)**0.5 

Run Example -

 In [202]: Ser = pd.Series(np.random.randint(0,9,(20))) In [203]: rolling_meansqdiff_loopy(Ser, w=10) Out[203]: [2.6095976701399777, 2.3000000000000003, 2.118962010041709, 2.022374841615669, 1.746424919657298, 1.7916472867168918, 1.3000000000000003, 1.7776388834631178, 1.6852299546352716, 1.6881943016134133, 1.7578395831246945] In [204]: rolling_meansqdiff_numpy(Ser.values, w=10) Out[204]: array([ 2.60959767, 2.3 , 2.11896201, 2.02237484, 1.74642492, 1.79164729, 1.3 , 1.77763888, 1.68522995, 1.6881943 , 1.75783958]) 

Runtime test

Local approach -

 def rolling_meansqdiff_loopy(Ser, w): length = Ser.shape[0]- w + 1 volList= [] for timestep in range(length): subSer=Ser[timestep:timestep+w] mean_i=np.mean(subSer) vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5 volList.append(vol_i) return volList 

Dates -

 In [223]: Ser = pd.Series(np.random.randint(0,9,(10000))) In [224]: %timeit rolling_meansqdiff_loopy(Ser, w=10) 1 loops, best of 3: 2.63 s per loop # @Mad Physicist vectorized soln In [225]: %timeit Ser.rolling(10).std(ddof=0) 1000 loops, best of 3: 380 Β΅s per loop In [226]: %timeit rolling_meansqdiff_numpy(Ser.values, w=10) 1000 loops, best of 3: 393 Β΅s per loop 

Acceleration close to 7000x there with two vectorized 7000x approaches!

+3
source

Source: https://habr.com/ru/post/1266479/


All Articles