This is a call to the community to find out if anyone has an idea to improve the speed of this MSD calculation implementation. This is mainly related to the implementation of this blog: http://damcb.com/mean-square-disp.html
Currently, the current implementation takes about 9 seconds for a 2D path of 5,000 points. This is really too much if you need to calculate many trajectories ...
I did not try to parallelize it (using multiprocess or joblib ), but I feel that creating new processes will be too difficult for such an algorithm.
Here is the code:
import os import matplotlib import matplotlib.pyplot as plt import pandas as pd import numpy as np
And the conclusion:
txy 0 0.000000 -1 -1 1 0.020004 -1 0 2 0.040008 -1 -1 3 0.060012 -2 -2 4 0.080016 -2 -2

def compute_msd(trajectory, t_step, coords=['x', 'y']): tau = trajectory['t'].copy() shifts = np.floor(tau / t_step).astype(np.int) msds = np.zeros(shifts.size) msds_std = np.zeros(shifts.size) for i, shift in enumerate(shifts): diffs = trajectory[coords] - trajectory[coords].shift(-shift) sqdist = np.square(diffs).sum(axis=1) msds[i] = sqdist.mean() msds_std[i] = sqdist.std() msds = pd.DataFrame({'msds': msds, 'tau': tau, 'msds_std': msds_std}) return msds
And the conclusion:
msds msds_std tau 0 0.000000 0.000000 0.000000 1 1.316463 0.668169 0.020004 2 2.607243 2.078604 0.040008 3 3.891935 3.368651 0.060012 4 5.200761 4.685497 0.080016

And some profiling:
%timeit msd = compute_msd(traj, t_step=dt, coords=['x', 'y'])
Give this:
1 loops, best of 3: 8.53 s per loop
Any idea?
source share