I have a large dataset representing 1.2 M points in a 220-dimensional periodic space (x changes fom (-pi, pi)) ... (matrix: 1.2M x 220).
I would like to calculate a histogram of the distances between these points, taking into account the frequency. I wrote the code in python, but still it works quite slowly for my test case (I'm not even trying to run it on the whole set ...).
Can you take a look and help me with some customization?
Any suggestions or comments that were highly appreciated.
import numpy as np
d=np.random.random((1000, 220))*2*np.pi-np.pi
m=np.zeros(np.shape(d)[1])+np.pi
m_=np.sqrt(np.sum(m**2))
mm=np.floor(m_)
bins=mm/0.01
m=np.zeros(bins)
import time
start_time = time.time()
for i in range(np.shape(d)[0]):
diff=d[:-(i+1),:]-d[i+1:,:]
diff=np.absolute(diff)
adiff=diff-np.pi
diff=np.pi-np.absolute(adiff)
s=np.sqrt(np.einsum('ij,ij->i', diff,diff))
m+=np.histogram(s,range=(0,mm),bins=bins)[0]
print time.time() - start_time
source
share