Compound Assignment Operators in the Python Numpy Library

Question

Compound Assignment Operators in the Python Numpy Library

The "vectorization" of fancy indexing by the Python memory library sometimes produces unexpected results. For instance:

import numpy a = numpy.zeros((1000,4), dtype='uint32') b = numpy.zeros((1000,4), dtype='uint32') i = numpy.random.random_integers(0,999,1000) j = numpy.random.random_integers(0,3,1000) a[i,j] += 1 for k in xrange(1000): b[i[k],j[k]] += 1

It gives different results in arrays "a" and "b" (that is, the appearance of a tuple (i, j) appears as 1 in "a" regardless of repetitions, while repetitions are counted in "b"). This is easy to verify as follows:

 numpy.sum(a) 883 numpy.sum(b) 1000

It is also noteworthy that the version with fantastic indexing is almost two orders of magnitude faster than the for loop. My question is: "Is there an efficient numpy way to compute repeat counts implemented using the for loop in the above example?"

+6

python numpy

user1451766 Jun 12 '12 at 16:41

source share

1 answer

ecatmur · Answer 1 · 2012-06-12T17:03:32+0000

This should do what you want:

 np.bincount(np.ravel_multi_index((i, j), (1000, 4)), minlength=4000).reshape(1000, 4)

As a breakdown, ravel_multi_index converts the index pairs indicated by i and j into integer indices in a C-flattened array; bincount counts the number of times each value of 0..4000 appears in this index list; and reshape converts the C-flattened array back to a 2d array.

As for performance, I measure it 200 times faster than "b", and 5 times faster than "a"; Your mileage may vary.

Since you need to write scores to an existing array a , try the following:

 u, inv = np.unique(np.ravel_multi_index((i, j), (1000, 4)), return_inverse=True) a.flat[u] += np.bincount(inv)

I am making this second method a little slower (2x) than "a", which is not too surprising since the unique step will be slow.

Compound Assignment Operators in the Python Numpy Library

More articles: