Efficiently execute boolean expressions in bitmaps in C or C ++

What is the most efficient way to execute a boolean expression in a bitmap in C or C ++? For example, suppose I have a 4-bit bitmap (a, b, c, d) . Now let's say that I have a simple Boolean expression like (a AND b) OR (c AND d) . How should I represent a logical expression so that I can effectively apply it to my bitmap? I am looking for a universal solution that can be applied to any logical expression, and not just to an example. In other words, I'm looking for a way to “compile” a logical expression into another data structure that can be used to effectively reduce my bitmap to a logical one.

The structure of the bitmap is the result of filtering records in the database. Each record has its own bitmap, and each bit in the bitmap is the result of a separate filter rule. A Boolean expression is used to combine these filtering rules to decide whether a record should be included in the results of a database query. There can be up to 64 individual filtering rules that can be combined by a logical operation, so the bitmap can be represented as unsigned long long int , if necessary.

The solution should be efficient in terms of speed and should not change the structure of the bitmap image. Converting a Boolean expression to another structure does not have to be memory efficient or fast, as it can be cached (at least in my current use case). Shrinking a bitmap with a converted boolean expression must be both fast and memory efficient.

Notes:

  • A logical expression uses only nested AND and OR (no IF) operations.
  • The solution should assume the presence of a 64-bit processor.
  • The solution should not be CPU dependent (other than 64-bit addressing).
  • The solution should not imply any other specific equipment (such as a GPU).
  • All bitmap images are in memory.
  • There can be a very large number of raster images (billions).
  • Bitmaps are updated one at a time.
+5
source share
3 answers

The most efficient way to use AND or OR operations in bitmap images is to use hardware help. Many GPUs can perform operations on two raster images. There is no standard C ++ library library for this.

You need to perform an operation on each bit, byte, word or double word in bitmap images.

The next speed-efficient method is to deploy a loop. Instructions for working out instructions for working out instructions (which can be used for data instructions) and can clear it of time loss.

You can also get some efficiency through efficient use of the processor data cache. Download a bunch of variables, perform the operation, save the result, repeat.

You should also extract groups using the processor word size. A 32-bit processor loves collecting 32 bits at a time. Thus, this will give you 8 sets of 4-bit pixels that load with a single fetch. Otherwise, you will have to extract 8 bits at a time, which leads to 4 samples of 8 bits compared to 1 sample of 32 bits.

Here's the main algorithm:

 uint8_t * p_bitmap_a = &Bitmap_A[0]; uint8_t * p_bitmap_b = &Bitmap_B[0]; uint8_t * p_bitmap_c = &Bitmap_C[0]; // C = A AND B for (unsigned int i = 0; i < bitmap_size / 4; ++i) { uint32_t a = *((uint32_t*) p_bitmap_a); uinte2_t b = *((uint32_t*) p_bitmap_b); uint32_t c = a & b; *((uint32_t *) p_bitmap_c) = c; p_bitmap_a += sizeof(uint32_t); p_bitmap_b += sizeof(uint32_t); p_bitmap_c += sizeof(uint32_t); } 

Change 1:
Your processor may have instructions that can help with operations. For example, an ARM7 processor can load many registers from memory with a single instruction. Review the processor instruction set. You may need to use the built-in assembler language to use the instructions of a particular processor.

Edit 2: Threading and Parallel processing.

If your bitmaps are huge, the overhead of supporting multiple threads of execution or parallel execution may outweigh the benefits. For example, if the overhead of synchronizing with another CPU core is 200 ms, and processing the raster image without interruption is 1000 ms, you spent time using parallel processing on one raster image (1200 ms to have another main process - bitmap).

If you have many bitmaps, you can get some time using parallel processing or several threads:

  • One stream extracts raster images from the database into memory (buffers).
  • The other stream processes the bitmap images and stores it in the outgoing buffer.
  • The third process writes buffered bitmaps to the database.

If you are extracting bitmaps from an external source, such as a database, this I / O will be your bottleneck. This is the part you need to optimize or cheat.

+2
source

If the GUARANTEED bitmaps will always be 4 bits, then they will fit into the lower 4 char bits, and only 16 possible values ​​for any bitmap.

For a particular Boolean expression, you then evaluate it for each of the sixteen possible bit combinations, which gives you a set of sixteen bits of the result. Collect them in sixteen-bit ints: false , false , false , false in bits 0, false , false , false , true in bit 1, etc.

Now for an arbitrary bitmap compared to an arbitrary Boolean, your check becomes:

  • Process the bitmap as a 4-bit int, evaluate 1 << (4 bit int) .
  • Take the result of this shift and use the C ++ & operator to check for cache caching of the 16-bit int value.

This will return == 0 for false and != 0 for true.

Reducing it to two commands: a shift and and is the fastest I can see.

This assumes that there is a fairly small number of logical operations that you apply on top, setting up one logical test will be expensive, but since you are talking about billions of bitmaps, I’m assuming that you will use the same logical operation for many bitmaps .

+1
source

You can represent the expression as a binary tree, and perhaps also use two classes for two node types. You can also parameterize each node with an operation, but this is hardly worth it. You might also make a non-node with a single input. Entrances to nodes are either places in your bitmap or other nodes, so I am subclassing for the first case, which takes an index in a bitmap as a parameter. You complete this code by writing a value function for the parameter "A" node and terminating Or node.

 typedef unsigned long long Bitmap; Bitmap bitmap; struct Node { virtual bool value()=0; }; struct AbsNode : public Node { int bit; bool value() {return (bitmap>>bit)&1; } } struct AndNode : public Node { Node *operandA, *operandB; etc. } 
0
source

Source: https://habr.com/ru/post/1204008/


All Articles