C is the fastest way to compare two bitmaps

There are two arrays of raster images in the form of char arrays with millions of records. What could be the fastest way to compare them using C.

I can imagine how to use the bitwise operator xor 1 byte at a time in a for loop.

Important information about bitmaps:

  • The algorithm runs from 1% to 10% times, bitmap images may vary. Most of the time they will be the same. When they can differ, they can reach 100%. There is a high probability of bits changing in a continuous band.
  • Both bitmaps have the same length.

Goal:

  • Check if they are different, and yes, then where.
  • Be correct every time (the probability of detecting an error, if any, should be 1).
+4
source share
1 answer

This answer assumes that you mean “bitmap” as a sequence of 0/1 values, not “bitmap format”

If you just have two bitmaps of the same length and want to quickly compare them, memcmp() will be effective, as someone suggested in the comments. You can try using SSE type optimizations, but it's not as simple as memcmp() . memcmp() assumes that you just want to know that they are different, and nothing more.

If you want to know how many bits they differ, for example, 615 bits differ, then again you have few options, except for XOR each byte and counting the number of differences. As others have noted, you probably want to do this more 32/64 or even 256 bits at a time, depending on your platform. However, if arrays are millions of bytes, then the biggest delay (with current processors) will be the time it takes to transfer main memory to the processor, and that doesn't matter much what the processor does (there are a lot of caveats here)

If you ask a question, ask more about comparing A with B, but in fact you do it many times, for example, from A to B and C, D, E, etc., then you can do a couple of things

  • A. Store the checksum of each array and first compare the checksums, if they are the same, then there is a high probability that the arrays are the same. Obviously, there is a risk that the checksums may be equal, but the data may differ, so make sure that a false result in this case will not have dramatic side effects. And, if you cannot resist false results, do not use this technique.
  • C. if arrays have a structure, for example, they are image data, and then use certain tools for this, how to explain this answer.
  • C. If the image data can be effectively compressed, then compress each array and compare it using a compressed form. If you use the ZIP type of compression, you cannot directly specify from zip how many bits are different, but other methods, such as RLE, can be effective for quickly counting bit-bits (but they work hard to build and get the right and fast results)
  • D. If the risk with (a) is acceptable, then you can check every part, say, 262144 bits, and only take into account the differences when the checksums differ. This greatly reduces access to main memory and will be much faster.

All A..D options relate to reducing access to main memory, since this is zero of any performance gain (for the problem, as indicated)

+2
source

Source: https://habr.com/ru/post/1487521/


All Articles