I ran into a rather interesting problem. I have a (fairly large) number of blocks. A block is just something that starts with an offset and has length and color. Displacement and length are limited - the space in which these blocks are located is <0-N>, where N is from hundreds of thousands to several millions. An invalid block is any block with an offset greater than N or the sum of an offset and a length greater than N. There are about 16 different colors that a block can have (only one of them).
There may be thousands of blocks, and there are always such situations:
block_X: off: 100, len: 50, color: blue
block_Y: off: 148, len: 50, color: blue
block_Z: off: 200, len: 30, color: red
As you can see, blocks X and Y can be connected to one larger block, resulting in:
block_XY: off: 100, len 98, color: blue
block_Z: off 200, len 30, color: red
I would like to create an algorithm that will go through all the blocks and bind those that overlap and have the same color. In fact, if the gap between the blocks is small enough (a constant that I could choose, say about 16 or so, but could be any number ...), then I would like to connect these blocks anyway. Note that in the above example, there are only two blocks that are connected. In fact, the sequence of blocks that can be connected is much longer in most cases.
There is also an interesting twist, consider the following:
block_A: off: 100, len: 200, color: blue
block_B: off: 200, len: 200, color: blue
block_C: off: 300, len: 150, color: red
block_D: off: 400, len: 200, color: blue
block_E: off: 500, len: 200, color: blue
As you can see, we have a beautiful sequence of blue blocks that can be combined into one big blue block. However, there is a small red block in the middle. This should not deceive the algorithm, the correct result is:
block_ABDE: off: 100, len: 600, color: blue
block_C: off: 300, len: 150, color: red
The data is stored in std :: multimap, where the key is the offset, and the value is a structure containing the length and color of the block, something like: multimap<uint32_t,pair<uint32_t, uint8_t> > . Note that there may be blocks starting at the same offset. However, it is guaranteed that if this happens, blocks starting at the same offset have different colors and different lengths.
Can anyone come up with how to solve this problem effectively? The purpose of the algorithm is to reduce the number of blocks that we have, but it is NOT required to reduce it to the smallest possible number.