Exponential Decay in Python Pandas DataFrame

Question

Exponential Decay in Python Pandas DataFrame

I am trying to efficiently calculate the current sum with the exponential decay of each Pandas DataFrame column. The DataFrame contains a daily rating for each country in the world. DataFrame is as follows:

AF UK US 2014-07-01 0.998042 0.595720 0.524698 2014-07-02 0.380649 0.838436 0.355149 2014-07-03 0.306240 0.274755 0.964524 2014-07-04 0.396721 0.836027 0.225848 2014-07-05 0.151291 0.677794 0.603548 2014-07-06 0.558846 0.050535 0.551785 2014-07-07 0.463514 0.552748 0.265537 2014-07-08 0.240282 0.278825 0.116432 2014-07-09 0.309446 0.096573 0.246021 2014-07-10 0.800977 0.583496 0.713893

I am not sure how to calculate the current amount (with decay) without repeating through the data frame, since I need to know yesterday's score in order to calculate today's rating. But in order to calculate yesterday’s point, I need to know the day before yesterday’s score, etc. This is the code I used, but I would like a more efficient way of doing this.

 for j, val in df.iteritems(): for i, row in enumerate(val): df[j].iloc[i] = row + val[i-1]*np.exp(-0.05)

+6

python numpy pandas

idubs11 Sep 03 '14 at 16:32

source share

1 answer

undershock · Accepted Answer · 2014-09-03T17:57:51+0000

You can use the fact that when exhibitors multiply their numbers, add:

eg:

 N(2) = N(2) + N(1) * exp(-0.05) N(3) = N(3) + (N(2) + N(1) * exp(-0.05))*exp(-0.05) N(3) = N(3) + N(2)*exp(-0.05) + N(1)*exp(-0.1) N(4) = ...and so on

This can then be vectorized using numpy:

 dataset = pd.DataFrame(np.random.rand(1000,3), columns=["A", "B","C"]) weightspace = np.exp(np.linspace(len(dataset), 0, num=len(dataset))*-0.05) def rollingsum(array): weights = weightspace[0-len(array):] # Convolve the array and the weights to obtain the result a = np.dot(array, weights).sum() return a a = pd.expanding_apply(dataset, rollingsum)

pd.expanding_apply applies the rollingsum function back to each row, calling it len(dataset) times. np.linspace generates a len(dataset) sized data set and calculates how many times each row is multiplied by exp(-0.05) for the current row.

Since it is vectorized, it should be fast:

 %timeit a = pd.expanding_apply(dataset, rollingsum) 10 loops, best of 3: 25.5 ms per loop

This compares to (note that I am using python 3 and had to make changes to the behavior in the first line ...):

 def multipleApply(df): for j, val in df.iteritems(): for i, row in enumerate(val): if i == 0: continue df[j].iloc[i] = row + val[i-1]*np.exp(-0.05)

It looks like:

 In[68]: %timeit multipleApply(dataset) 1 loops, best of 3: 414 ms per loop

Exponential Decay in Python Pandas DataFrame

More articles: