Efficiently calculate middle bits of many bit strings in python / numpy?

Question

Efficiently calculate middle bits of many bit strings in python / numpy?

I have several thousand bitstrons stored as longs. Each bit string has 1024 bits. I would like to create an array of relations, each bit of which is equal to 1.

For example (pseudo code):

bs = [
    1 0 0 0,
    0 1 1 0,
    1 1 0 0,
    0 0 0 0
]
ratios(bs) => [0.5, 0.5, 0.25 0.0]

My current slow code is:

def mean_signature(bitstrings, bit_count):
    means = []
    for b in range(bit_count):
        m = sum((x >> b) & 1 for x in bitstrings) / len(bitstrings)
        means.append(m)
    return means

I'm going to change the code so that the outer loop ends bitstrings, but I think I should skip something. Perhaps using arrays with a numerical bit.

+4

performance python numpy

Annan Apr 18 '17 at 9:48

source share

2 answers

Warren Weckesser · Answer 1 · 2017-04-18T14:30:05+0000

Here is one way to do this, but this is probably not the most efficient method.

For demonstration, I will use 8-bit integers, but it will also work with your 1024-bit integers.

In [28]: bs = [0b11110000, 0b11111100, 0b11000000, 0b11111110, 0b00001100]

In [29]: bs
Out[29]: [240, 252, 192, 254, 12]

In [30]: nbits = 8

In [31]: bits = np.array([list(np.binary_repr(b, width=nbits)) for b in bs], dtype=np.uint8)

In [32]: bits
Out[32]: 
array([[1, 1, 1, 1, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1, 1, 0],
       [0, 0, 0, 0, 1, 1, 0, 0]], dtype=uint8)

bits - , . , , :

In [33]: bits.mean(axis=0)
Out[33]: array([ 0.8,  0.8,  0.6,  0.6,  0.6,  0.6,  0.2,  0. ])

. . :

In [34]: bits.mean(axis=0)[::-1]
Out[34]: array([ 0. ,  0.2,  0.6,  0.6,  0.6,  0.6,  0.8,  0.8])

Eric · Answer 2 · 2017-04-18T14:39:26+0000

- long s numpy, . () :

def long_to_multi_word(l, dtype=np.uint64, nwords=None):
    dtype = np.dtype(dtype)
    l = np.asarray(l, object)

    nbits = 8 * dtype.itemsize

    if nwords is None:
        lmax = l.max()
        nwords = 0
        while lmax != 0:
            lmax >>= nbits
            nwords += 1

    arr = np.zeros(l.shape + (nwords,), dtype)

    mask = (1 << nbits) - 1

    for i in range(0, nwords):
        arr[...,i] = l & mask
        l = l >> nbits

    return arr

:

>>> data = [1, 2, 3, 2**128 + 2**64 + 42]   # one of these is too big to fit in a uint64
>>> data_words = long_to_multi_word(data)
>>> data_words
array([[ 1,  0,  0],
       [ 2,  0,  0],
       [ 3,  0,  0],
       [42,  1,  1]], dtype=uint64)

, np.unpackbits:

# could have used long_to_multi_word(data, np.uint8), but would be slower
data_bytes = data_words.view(np.uint8)

data_bits = np.unpackbits(data_bytes, axis=-1)
n_bits = data_bits.sum(axis=0)

Efficiently calculate middle bits of many bit strings in python / numpy?

More articles: