How does GDI + function so fast?

I am trying to recreate very simple GDI + features such as scaling and image rotation. The reason is that some GDI functions cannot be executed on multiple threads (I found work using processes, but did not want to do this), and processing thousands of images on a single thread was not almost cutting. Also, my images have shades of gray, so the user function should only worry about one value instead of 4.

No matter what function I try to recreate, even if it is very optimized, it is always FREE times slower, despite the fact that it is significantly simplified compared to what GDI does (I work on a 1D byte array, one byte per pixel)

I thought that maybe the way I rotated each point could be a difference, so I took it out completely and basically had a function that goes through each pixel and just sets it to what it already is, and it was only roughly related to GDI speed, although GDI did the actual rotation and changed 4 different values ​​per pixel.

What makes this possible? Is there a way to map it using your own function?

+5
source share
2 answers

GDI + code is written in C / C ++ or, possibly, partially in the assembly. Some GDI + calls may use GDI, an old and well-optimized API. It will be difficult for you to match performance, even if you know all the tricks of manipulating pixels.

+3
source

I am adding my own answer along with my code to help someone else who can do this.

From a combination of pointers and using an approximation of sine and cosine instead of calling an external function to rotate, I came pretty close to achieving GDI speed. External functions are not called at all.

It's still 50% longer than GDI, but my earlier implementation took more than 10 times longer than GDI. And when you consider multithreading, this method can be 10 times faster than GDI. This function can rotate a 300x400 image in 3 milliseconds on my machine.

Keep in mind that this is for grayscale images, and each byte in the input array represents one pixel. If you have ideas to make it faster, please share!

private unsafe byte[] rotate(byte[] input, int inputWidth, int inputHeight, int cx, int cy, double angle) { byte[] result = new byte[input.Length]; int tx, ty, ix, iy, x1, y1; double px, py, fx, fy, sin, cos, v; byte a, b; //Approximate Sine and Cosine of the angle if (angle < 0) sin = 1.27323954 * angle + 0.405284735 * angle * angle; else sin = 1.27323954 * angle - 0.405284735 * angle * angle; angle += 1.57079632; if (angle > 3.14159265) angle -= 6.28318531; if (angle < 0) cos = 1.27323954 * angle + 0.405284735 * angle * angle; else cos = 1.27323954 * angle - 0.405284735 * angle * angle; angle -= 1.57079632; fixed (byte* pInput = input, pResult = result) { byte* pi = pInput; byte* pr = pResult; for (int x = 0; x < inputWidth; x++) for (int y = 0; y < inputHeight; y++) { tx = x - cx; ty = y - cy; px = tx * cos - ty * sin + cx; py = tx * sin + ty * cos + cy; ix = (int)px; iy = (int)py; fx = px - ix; fy = py - iy; if (ix < inputWidth && iy < inputHeight && ix >= 0 && iy >= 0) { //keep in array bounds x1 = ix + 1; y1 = iy + 1; if (x1 >= inputWidth) x1 = ix; if (y1 >= inputHeight) y1 = iy; //bilinear interpolation using pointers a = *(pInput + (iy * inputWidth + ix)); b = *(pInput + (y1 * inputWidth + ix)); v = a + ((*(pInput + (iy * inputWidth + x1)) - a) * fx); pr = (pResult + (y * inputWidth + x)); *pr = (byte)(v + (((b + ((*(pInput + (y1 * inputWidth + x1)) - b) * fx)) - v) * fy)); } } } return result; } 
+2
source

Source: https://habr.com/ru/post/1238106/


All Articles