Why does a numpy array with dtype = object result in a significantly smaller file size than dtype = int?

Here is an example:

import numpy as np
randoms = np.random.randint(0, 20, 10000000)

a = randoms.astype(np.int)
b = randoms.astype(np.object)

np.save('d:/dtype=int.npy', a)     #39 mb
np.save('d:/dtype=object.npy', b)  #19 mb! 

You can see that the file with dtype = object is approximately half the size. How so? I got the impression that properly defined numpy dtypes are strictly better than dtypes objects.

+4
source share
2 answers

With a dtype non-object, most of the npy file format consists of dumping raw bytes of array data. It will be either 4 or 8 bytes per element here, depending on whether your NumPy will default to 4 or 8 bytes. From the file size, it looks like 4 bytes per element.

dtype npy . , , , pickle K pickle, BININT1, "" pickletools:

I(name='BININT1',
  code='K',
  arg=uint1,
  stack_before=[],
  stack_after=[pyint],
  proto=1,
  doc="""Push a one-byte unsigned integer.

  This is a space optimization for pickling very small non-negative ints,
  in range(256).
  """),

, K .

, , dtype numpy.int8 numpy.uint8, 1 .

+6

EDIT: . . 2357112 .

dtype=object NPY. ; , b[i] is b[j], , b[i] b[j], , . , , .

Python , -5 256, , range(0, 20), . numpy .astype(object).

, , , uniform(0.0, 1.0, 10000000), .

+2

Source: https://habr.com/ru/post/1665789/


All Articles