You can use the fact that when exhibitors multiply their numbers, add:
eg:
N(2) = N(2) + N(1) * exp(-0.05) N(3) = N(3) + (N(2) + N(1) * exp(-0.05))*exp(-0.05) N(3) = N(3) + N(2)*exp(-0.05) + N(1)*exp(-0.1) N(4) = ...and so on
This can then be vectorized using numpy:
dataset = pd.DataFrame(np.random.rand(1000,3), columns=["A", "B","C"]) weightspace = np.exp(np.linspace(len(dataset), 0, num=len(dataset))*-0.05) def rollingsum(array): weights = weightspace[0-len(array):]
pd.expanding_apply applies the rollingsum function back to each row, calling it len(dataset) times. np.linspace generates a len(dataset) sized data set and calculates how many times each row is multiplied by exp(-0.05) for the current row.
Since it is vectorized, it should be fast:
%timeit a = pd.expanding_apply(dataset, rollingsum) 10 loops, best of 3: 25.5 ms per loop
This compares to (note that I am using python 3 and had to make changes to the behavior in the first line ...):
def multipleApply(df): for j, val in df.iteritems(): for i, row in enumerate(val): if i == 0: continue df[j].iloc[i] = row + val[i-1]*np.exp(-0.05)
It looks like:
In[68]: %timeit multipleApply(dataset) 1 loops, best of 3: 414 ms per loop
source share