Cycle mask without shift

We have some large binary number N (large means millions of digits). We also have a binary mask M, where 1 means that we must remove the digit at this position in the number N and move the higher and higher bits one position to the right.

Example:

N = 100011101110 M = 000010001000 Res 1000110110 

Is it possible to solve this problem without a loop with some set of logical or arithmetic operations? We can assume that we have access to signal arithmetic in Python.

It looks like it should be something like this: Res = N - (N xor M) But that doesn't work.

UPD My current solution with a loop:

 def prepare_reduced_arrays(dict_of_N, mask): ''' mask: string '0000011000' each element of dict_of_N - big python integer ''' capacity = len(mask) answer = dict() for el in dict_of_N: answer[el] = 0 new_capacity = 0 for i in range(capacity - 1, -1, -1): if mask[i] == '1': continue cap2 = (1 << new_capacity) pos = (capacity - i - 1) for el in dict_of_N: current_bit = (dict_of_N[el] >> pos) & 1 if current_bit: answer[el] |= cap2 new_capacity += 1 return answer, new_capacity 
+5
source share
3 answers

Although this may not be possible without a loop in python, it can be done very quickly with numba and only at compile time. I proceeded from the assumption that your inputs can be easily represented as Boolean arrays, which are very simple to build from a binary file using struct . The method I implemented involves iterating several different objects, however these iterations were carefully chosen to make sure they are optimized for the compiler and never do the same job twice. The first iteration uses np.where to determine the indices of all bits to be deleted. This particular function (among many others) is optimized using the numba compiler. Then I use this list of bit indexes to build slice indices for bit fragments to save. The final loop copies these fragments to an empty output array.

 import numpy as np from numba import jit from time import time def binary_mask(num, mask): num_nbits = num.shape[0] #how many bits are in our big num mask_bits = np.where(mask)[0] #which bits are we deleting mask_n_bits = mask_bits.shape[0] #how many bits are we deleting start = np.empty(mask_n_bits + 1, dtype=int) #preallocate array for slice start indexes start[0] = 0 #first slice starts at 0 start[1:] = mask_bits + 1 #subsequent slices start 1 after each True bit in mask end = np.empty(mask_n_bits + 1, dtype=int) #preallocate array for slice end indexes end[:mask_n_bits] = mask_bits #each slice ends on (but does not include) True bits in the mask end[mask_n_bits] = num_nbits + 1 #last slice goes all the way to the end out = np.empty(num_nbits - mask_n_bits, dtype=np.uint8) #preallocate return array for i in range(mask_n_bits + 1): #for each slice a = start[i] #use local variables to reduce number of lookups b = end[i] c = a - i d = b - i out[c:d] = num[a:b] #copy slices return out jit_binary_mask = jit("b1[:](b1[:], b1[:])")(binary_mask) #decorator without syntax sugar ###################### Benchmark ######################## bignum = np.random.randint(0,2,1000000, dtype=bool) # 1 million random bits bigmask = np.random.randint(0,10,1000000, dtype=np.uint8)==9 #delete about 1 in 10 bits t = time() for _ in range(10): #10 cycles of just numpy implementation out = binary_mask(bignum, bigmask) print(f"non-jit: {time()-t} seconds") t = time() out = jit_binary_mask(bignum, bigmask) #once ahead of time to compile compile_and_run = time() - t t = time() for _ in range(10): #10 cycles of compiled numpy implementation out = jit_binary_mask(bignum, bigmask) jit_runtime = time()-t print(f"jit: {jit_runtime} seconds") print(f"estimated compile_time: {compile_and_run - jit_runtime/10}") 

In this example, I am benchmarking a 1,000,000-long boolean array for a total of 10 times for a compiled and non-compiled version. On my laptop, the output is:

  non-jit: 1.865583896636963 seconds
 jit: 0.06370806694030762 seconds
 estimated compile_time: 0.1652850866317749 

As you can see with such a simple algorithm, a very significant performance gain can be observed from compilation. (in my case, about 20-30 times faster)

+4
source

As far as I know, this can be done without using cycles if and only if M is a power of 2.

Take your example and change M so that it has a strength of 2:

 N = 0b100011101110 = 2286 M = 0b000000001000 = 8 

Removing the fourth least significant bit from N and shifting the most significant bits to the right will result in:

 N = 0b10001110110 = 1142 

We achieved this using the following algorithm:

  • Start with N = 0b100011101110 = 2286
  • Extract from the most significant bit to the least significant bit in M
  • If the current bit in M set to 1, then store the least significant bits in some variable, x :
    • x = 0b1101110
  • Then subtract each bit before and turn on the current bit in M from N to get the following result:
    • N - (0b10000000 + x) = N - (0b10000000 + 0b1101110) = 0b100011101110 - 0b11101110 = 0b100000000000
    • This step can also be achieved by using bits with 0 , which can be more efficient.
  • Then we shift the result once to the right:
    • 0b100000000000 >> 1 = 0b10000000000
  • Finally, we add back x to the shifted result:
    • 0b10000000000 + x = 0b10000000000 + 0b1101110 = 0b10001101110 = 1142

It is possible that this can be done in some way without loops, but it would be really effective if you just went through M (from the most significant bit to the minimum β€” a significant bit) and performed this process on each bit, since the time complexity would be O(M.bit_length()) .

I wrote code for this algorithm, and I find it relatively efficient, but I don't have large binary numbers to test it:

 def remove_bits(N, M): bit = 2 ** (M.bit_length() - 1) while bit != 0: if M & bit: ones = bit - 1 # Store lower `bit` bits. temp = N & ones # Clear lower `bit` bits. N &= ~ones # Shift once to the right. N >>= 1 # Set stored lower `bit` bits. N |= temp bit >>= 1 return N if __name__ == '__main__': N = 0b100011101110 M = 0b000010001000 print(bin(remove_bits(N, M))) 

Using your example, this returns your result: 0b1000110110

+3
source

I don’t think there is a way to do this with a constant number of calls to the built-in bitwise operators. Python would have to provide something like PEXT to make this possible.

For literally millions of digits, you can get better performance by working in terms of bit sequences, sacrificing the advantages of the Python ints space and the advantages of bitwise operations in favor of more flexibility in the operations you can perform. I don't know where the breakeven point is:

 import itertools bits = bin(N)[2:] maskbits = bin(M)[2:].zfill(len(bits)) bits = bits.zfill(len(maskbits)) chosenbits = itertools.compress(bits, map('0'.__eq__, maskbits)) result = int(''.join(chosenbits), 2) 
+1
source

Source: https://habr.com/ru/post/1268110/


All Articles