How to get integer array from numpy.bincount when weight parameter is integer

Question

How to get integer array from numpy.bincount when weight parameter is integer

Consider a numpy a array

 a = np.array([1, 0, 2, 1, 1])

If I do a bin counter, I get integers

 np.bincount(a) array([1, 3, 1])

But if I add scales to execute an equivalent amount of bin

 np.bincount(a, np.ones_like(a)) array([ 1., 3., 1.])

Same values but float . What is the smartest way to manipulate them before int ? Why doesn't numpy accept the same dtype type as weights?

+5

python numpy

piRSquared Jun 13 '17 at 22:24

source share

2 answers

you can use the built-in astipia method

 np.bincount(a, np.ones_like(a)).astype(int)

+2

Lester T. Jun 13 '17 at 22:45

source share

Mseifert · Accepted Answer · 2017-06-13T23:12:43+0000

Why doesn't numpy accept the same dtype type as weights?

There are two reasons:

There are several ways to weigh the count, either by multiplying the value by the weight, or by multiplying the value by the weight divided by the sum of the weights. In the latter case, it will always be double (because otherwise the division will be inaccurate).
In my experience, weighing with normalized weights (the second case) is more weighty. Therefore, it is actually reasonable (and certainly faster) to assume that they are floating.
Overflow It is impossible for the counts to exceed the integer limit because the array cannot have more values than this limit (there is a reason, otherwise you could not index the array). But if you multiply it with weights, it is not difficult to do an “overflow” of counters.

I think in this case this is probably the last reason.

It is unlikely that anyone will use really large integer weights and many duplicate values, but just assume what happens if:

 import numpy as np i = 10000000 np.bincount(np.ones(100000000, dtype=int), weights=np.ones(10000000, dtype=int)*1000000000000)

will return:

 array([0, -8446744073709551616])

instead of the actual result:

 array([ 0.00000000e+00, 1.00000000e+19])

This is combined with the first reason and the fact that it is very easy (personally, I consider it trivial) to convert floating point arrays to whole arrays:

 np.asarray(np.bincount(...), dtype=int)

Probably made a float to the "actual" returned type of weighted bincount .

"literal" reason:

the numpy source actually mentions that weights need to be converted to double ( float64 ):

 /* * arr_bincount is registered as bincount. * * bincount accepts one, two or three arguments. The first is an array of * non-negative integers The second, if present, is an array of weights, * which must be promotable to double. Call these arguments list and * weight. Both must be one-dimensional with len(weight) == len(list). If * weight is not present then bincount(list)[i] is the number of occurrences * of i in list. If weight is present then bincount(self,list, weight)[i] * is the sum of all weight[j] where list [j] == i. Self is not used. * The third argument, if present, is a minimum length desired for the * output array. */

And well, they and then just add it to double in the function. This is the "literal" reason why you get the result of a floating data type.

How to get integer array from numpy.bincount when weight parameter is integer

"literal" reason:

More articles: