I need to apply a convolution filter on each line of many images. Classic - 360 images with a resolution of 1024x1024 pixels. In my usage example, these are 720 images of 560x600 pixels.
The problem is that my code is much slower than what is advertised in the articles.
I implemented a naive convolution, and it takes 2 m 30 seconds. Then I switched to FFT using fftw. I used a complex 2 complex, filtering two lines in each transformation. I am now about 20 years old.
The fact is that articles advertise about 10 or even less for a classic fortune. So I would like to ask the experts here if there could be a faster way to calculate the convolution.
In numerical recipes, it is proposed to avoid sorting performed in dft and to adapt the frequency domain filter function accordingly. But there is no code example how to do this.
Perhaps I lose time when copying data. With real 2 real conversion, I would not have to copy the data to complexe values. But in any case, I need to put 0.
EDIT: see my own answer below for feedback on progress and additional information on resolving this issue.
Question (exact reformulation):
I am looking for an algorithm or piece of code to apply a very fast convolution to a discrete non-periodic function (values from 512 to 2048). Apparently, the discrete Fourier transform of time is the way to go. Although, I would like to avoid copying data and transforming into complex ones, and avoid reordering butterflies.