Here's one NumPy approach -
Run Example -
In [202]: Ser = pd.Series(np.random.randint(0,9,(20))) In [203]: rolling_meansqdiff_loopy(Ser, w=10) Out[203]: [2.6095976701399777, 2.3000000000000003, 2.118962010041709, 2.022374841615669, 1.746424919657298, 1.7916472867168918, 1.3000000000000003, 1.7776388834631178, 1.6852299546352716, 1.6881943016134133, 1.7578395831246945] In [204]: rolling_meansqdiff_numpy(Ser.values, w=10) Out[204]: array([ 2.60959767, 2.3 , 2.11896201, 2.02237484, 1.74642492, 1.79164729, 1.3 , 1.77763888, 1.68522995, 1.6881943 , 1.75783958])
Runtime test
Local approach -
def rolling_meansqdiff_loopy(Ser, w): length = Ser.shape[0]- w + 1 volList= [] for timestep in range(length): subSer=Ser[timestep:timestep+w] mean_i=np.mean(subSer) vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5 volList.append(vol_i) return volList
Dates -
In [223]: Ser = pd.Series(np.random.randint(0,9,(10000))) In [224]: %timeit rolling_meansqdiff_loopy(Ser, w=10) 1 loops, best of 3: 2.63 s per loop
Acceleration close to 7000x there with two vectorized 7000x approaches!
source share