Packing a boolean array should go through int (numpy 1.8.2)

I am looking for a more compact way to store logical data. numpy internally needs 8 bits to store one boolean, but np.packbitsallowing them to be packaged is pretty cool.

The problem is that for a packet in a 4e6 byte array, an array of 32e6 bytes of a boolean value, we must first spend 256e6 bytes to convert a logical array in an int array!

In [1]: db_bool = np.array(np.random.randint(2, size=(int(2e6), 16)), dtype=bool)
In [2]: db_int = np.asarray(db_bool, dtype=int)
In [3]: db_packed = np.packbits(db_int, axis=0)
In [4]: db.nbytes, db_int.nbytes, db_packed.nbytes
Out[5]: (32000000, 256000000, 4000000)

There is one year issue discovered in the numpy tracking log about this (see https://github.com/numpy/numpy/issues/5377 )

Does anyone have a solution / best workaround?

The trace when we try to do it right:

In [28]: db_pb = np.packbits(db_bool)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-3715e167166b> in <module>()
----> 1 db_pb = np.packbits(db_bool)
TypeError: Expected an input array of integer data type
In [29]:

PS: bitarray , .

+4
2

int dtype ( 64 x86_64). , np.uint8, :

packed = np.packbits(db_bool.view(np.uint8))

unpacked = np.unpackbits(packed)[:db_bool.size].reshape(db_bool.shape).view(np.bool)

print(np.all(db_bool == unpacked))
# True

, np.packbits (numpy v1.10.0 newer).

+5

, Python - ++. , "bitarray", Python bytearray.

, , , - Python . , Cython, , , np- dtype = int8 bytearray:

class BitArray(object):
    def __init__(self, length):
        self.values = bytearray(b"\x00" * (length // 8 + (1 if length % 8  else 0)))
        self.length = length

    def __setitem__(self, index, value):
        value = int(bool(value)) << (7 - index % 8)
        mask = 0xff ^ (7 - index % 8)
        self.values[index // 8] &= mask
        self.values[index // 8] |= value
    def __getitem__(self, index):
        mask = 1 << (7 - index % 8)
        return bool(self.values[index // 8] & mask)

    def __len__(self):
        return self.length

    def __repr__(self):
        return "<{}>".format(", ".join("{:d}".format(value) for value in self))

: Python, std:: bitset ++?

+4

Source: https://habr.com/ru/post/1622080/


All Articles