We can be a little smarter about indexing and save about ~ 4 times the price.
First, create some data of the correct form:
seed = np.random.randint(0, 100, (200,206))
data = np.random.randint(0, 100, (4e5,206))
seed[:, 0] = np.arange(200)
data[:, 0] = np.random.randint(0, 200, 4e5)
diam = np.empty(200)
Initial Response Time:
%%timeit
for i in range(200):
diam[i] = spd.cdist(seed[np.newaxis, i, 1:], data[data[:, 0]==i][:,1:]).max()
1 loops, best of 3: 1.35 s per loop
moarningsun answer:
%%timeit
seed_repeated = seed[data[:,0]]
dist_to_center = np.sqrt(np.sum((data[:,1:]-seed_repeated[:,1:])**2, axis=1))
diam = np.zeros(len(seed))
np.maximum.at(diam, data[:,0], dist_to_center)
1 loops, best of 3: 1.33 s per loop
Divakar's answer:
%%timeit
data_sorted = data[data[:, 0].argsort()]
seed_ext = np.repeat(seed,np.bincount(data_sorted[:,0]),axis=0)
dists = np.sqrt(((data_sorted[:,1:] - seed_ext[:,1:])**2).sum(1))
shift_idx = np.append(0,np.nonzero(np.diff(data_sorted[:,0]))[0]+1)
diam_out = np.maximum.reduceat(dists,shift_idx)
1 loops, best of 3: 1.65 s per loop
As we can see, nothing really happened with the help of vectorized solutions, except for a larger amount of memory. To avoid this, we need to go back to the original answer, which is really the right way to do this, and instead try to reduce the amount of indexing:
%%timeit
idx = data[:,0].argsort()
bins = np.bincount(data[:,0])
counter = 0
for i in range(200):
data_slice = idx[counter: counter+bins[i]]
diam[i] = spd.cdist(seed[None, i, 1:], data[data_slice, 1:]).max()
counter += bins[i]
1 loops, best of 3: 281 ms per loop
Double check the answer:
np.allclose(diam, dam_out)
True
This is a problem with assuming python loops are bad. They often happen, but not in all situations.