I would use a multi-scale grid approach (equivalent to four trees in one form or another).
I assume that you are using integer coordinates (i.e. pixels) and have enough space to store all the pixels.
Have an array of rectangle lists, one for each pixel. Then, twice in two, and do it again. And again, and again, and again, until you have one pixel that covers everything.
Now the key is that you insert the rectangles at a level suitable for the size of the rectangle. It will be something like (pixel size) ~ = min (height, width) / 2. Now for each rectangle you have only a few attachments that you need to make in the lists (you could link them above a constant, for example, select something that has 4 to 16 pixels).
If you want to search for all the rectangles in x, y, you look in the list of the smallest pixel, and then in the list of the dual-core 2x2 pixel that contains it, and then at 4x4, etc .; you must have steps log2 (number of pixels) to view. (For large pixels, you need to check if (x, y) is really in the rectangle, you expect about half of them to be successful at the borders, and all of them will be successful inside the rectangle, so you expect no worse than 2 times more work than if you looked at the pixel right away.)
Now, what about the insert? It is very inexpensive - O (1) to come to the forefront of the list.
How about removal? It is more expensive; you need to view and heal each list for each pixel you enter. This is approximately O (n) in the number of rectangles overlapping in this position in space and approximately the same size. If you have a really large number of rectangles, you should use a different data structure to store them (hash set, RB tree, etc.).
(Note: if your smallest rectangle should be larger than a pixel, you do not need to actually form a multiscale structure down to the pixel level, just lower it until the smallest rectangle is hopelessly lost inside your binder pixel.)