Compound Assignment Operators in the Python Numpy Library

The "vectorization" of fancy indexing by the Python memory library sometimes produces unexpected results. For instance:

import numpy a = numpy.zeros((1000,4), dtype='uint32') b = numpy.zeros((1000,4), dtype='uint32') i = numpy.random.random_integers(0,999,1000) j = numpy.random.random_integers(0,3,1000) a[i,j] += 1 for k in xrange(1000): b[i[k],j[k]] += 1 

It gives different results in arrays "a" and "b" (that is, the appearance of a tuple (i, j) appears as 1 in "a" regardless of repetitions, while repetitions are counted in "b"). This is easy to verify as follows:

 numpy.sum(a) 883 numpy.sum(b) 1000 

It is also noteworthy that the version with fantastic indexing is almost two orders of magnitude faster than the for loop. My question is: "Is there an efficient numpy way to compute repeat counts implemented using the for loop in the above example?"

+6
source share
1 answer

This should do what you want:

 np.bincount(np.ravel_multi_index((i, j), (1000, 4)), minlength=4000).reshape(1000, 4) 

As a breakdown, ravel_multi_index converts the index pairs indicated by i and j into integer indices in a C-flattened array; bincount counts the number of times each value of 0..4000 appears in this index list; and reshape converts the C-flattened array back to a 2d array.

As for performance, I measure it 200 times faster than "b", and 5 times faster than "a"; Your mileage may vary.

Since you need to write scores to an existing array a , try the following:

 u, inv = np.unique(np.ravel_multi_index((i, j), (1000, 4)), return_inverse=True) a.flat[u] += np.bincount(inv) 

I am making this second method a little slower (2x) than "a", which is not too surprising since the unique step will be slow.

+6
source

Source: https://habr.com/ru/post/917902/


All Articles