Numpy.sum () gives strange results on large arrays

I seem to have found a trap using .sum()on numpyarrays, but I cannot find an explanation. In fact, if I try to summarize a large array, then I start to get meaningless answers, but this happens silently, and I can not understand the output well enough for Google.

For example, this works exactly as expected:

a = sum(xrange(2000)) 
print('a is {}'.format(a))

b = np.arange(2000).sum()
print('b is {}'.format(b))

Providing the same output for both:

a is 1999000
b is 1999000

However, this does not work:

c = sum(xrange(200000)) 
print('c is {}'.format(c))

d = np.arange(200000).sum()
print('d is {}'.format(d))

The output of the following result:

c is 19999900000
d is -1474936480

And on an even larger array, you can get a positive result. This is more insidious, because I can’t determine that something unusual is happening at all. For instance:

e = sum(xrange(100000000))
print('e is {}'.format(e))

f = np.arange(100000000).sum()
print('f is {}'.format(f))

Gives the following:

e is 4999999950000000
f is 887459712

I assumed that this is due to data types, and even when using python float, there seems to be a problem:

e = sum(xrange(100000000))
print('e is {}'.format(e))

f = np.arange(100000000, dtype=float).sum()
print('f is {}'.format(f))

Donation:

e is 4999999950000000
f is 4.99999995e+15

Comp. Sci. (, ). , :

  • numpy . ; , , , MemoryError.
  • - 32- (, ); nope, , 64- .
  • sum; nope (?) , , .

-, , , , , ? , , dtype, ?

, :

Windows 7

numpy 1.11.3

Enthought Canopy Python 2.7.9

+4
4

Windows ( 64- ) NumPy , Python ints 32-. Linux Mac 64- .

64- :

d = np.arange(200000, dtype=np.int64).sum()
print('d is {}'.format(d))

:

c is 19999900000
d is 19999900000

, , functools.partial:

from functools import partial

np.arange = partial(np.arange, dtype=np.int64)

np.arange 64- .

+5

, arange(200000) Python:

>>> s = 0
>>> for i in range(200000):
...     s += i
...     s &= 0xffffffff
>>> s
2820030816
>>> s.bit_length()
32
>>> s - 2**32  # adjust for that "the sign bit" is set
-1474936480

, , , - , , numpy 32- 32- .

numpy, , ( , ).

+3

, 32-. numpy , np.seterr:

>>> import numpy as np
>>> np.seterr(over='raise')
{'divide': 'warn', 'invalid': 'warn', 'over': 'warn', 'under': 'ignore'}
>>> np.int8(127) + np.int8(2)
FloatingPointError: overflow encountered in byte_scalars

sum " ), . numpy !

dtype , :

>>> a = np.ones(129)
>>> a.sum(dtype=np.int8)  # will overflow
-127
>>> a.sum(dtype=np.int64)  # no overflow
129

# 593, , numpy devs.

+3

Numpy C long. 64- 64- . , Windows, long 32-.

numpy .

Unfortunately, as far as I know, there is no way to change the default value dtype. You will need to indicate it as np.int64every time.

You can try to create your own arange:

def arange(*args, **kw):
    return np.arange(dtype=np.int64, *args, **kw)

and then use this version instead of numpy.

EDIT: If you want to flag this , you can simply add something like this at the top of your code:

assert np.array(0).dtype.name != 'int32', 'This needs to be run with 64-bit integers!'
+2
source

Source: https://habr.com/ru/post/1667059/


All Articles