Why doesn't numpy accept the same dtype type as weights?
There are two reasons:
There are several ways to weigh the count, either by multiplying the value by the weight, or by multiplying the value by the weight divided by the sum of the weights. In the latter case, it will always be double (because otherwise the division will be inaccurate).
In my experience, weighing with normalized weights (the second case) is more weighty. Therefore, it is actually reasonable (and certainly faster) to assume that they are floating.
Overflow It is impossible for the counts to exceed the integer limit because the array cannot have more values โโthan this limit (there is a reason, otherwise you could not index the array). But if you multiply it with weights, it is not difficult to do an โoverflowโ of counters.
I think in this case this is probably the last reason.
It is unlikely that anyone will use really large integer weights and many duplicate values, but just assume what happens if:
import numpy as np i = 10000000 np.bincount(np.ones(100000000, dtype=int), weights=np.ones(10000000, dtype=int)*1000000000000)
will return:
array([0, -8446744073709551616])
instead of the actual result:
array([ 0.00000000e+00, 1.00000000e+19])
This is combined with the first reason and the fact that it is very easy (personally, I consider it trivial) to convert floating point arrays to whole arrays:
np.asarray(np.bincount(...), dtype=int)
Probably made a float to the "actual" returned type of weighted bincount .
"literal" reason:
the numpy source actually mentions that weights need to be converted to double ( float64 ):
/* * arr_bincount is registered as bincount. * * bincount accepts one, two or three arguments. The first is an array of * non-negative integers The second, if present, is an array of weights, * which must be promotable to double. Call these arguments list and * weight. Both must be one-dimensional with len(weight) == len(list). If * weight is not present then bincount(list)[i] is the number of occurrences * of i in list. If weight is present then bincount(self,list, weight)[i] * is the sum of all weight[j] where list [j] == i. Self is not used. * The third argument, if present, is a minimum length desired for the * output array. */
And well, they and then just add it to double in the function. This is the "literal" reason why you get the result of a floating data type.