Finite Difference Optimization with SSE

I am wondering if SSE (1,2,3,4, ...) can be used to optimize the following loop:

// u and v are allocated through new double[size*size]
for (int j = l; j < size-1; ++j)
{
    for (int k = 1; k < size-1; ++k)
    {
        v[j*size + k] = (u[j*size + k-1] + u[j*size + k+1] 
                       + u[(j-1)*size + k]+ u[(j+1)*size + k]) / 4.0;
    }
}

The idiom is [j*size + k]used to process a block of memory as if it were a multidimensional array.

Unfortunately, the flag -ftree-vectorizefor GCC (4.5) does not consider that the cycle lends itself to optimization of the SIMD type. (Although I say that I have never seen -ftree-vectorizeoptimizing anything other than the most trivial cycles.)

Although I know that there are many other ways to improve loop performance (OpenMP, deployment, local algorithms, etc.). I am particularly interested to know if SIMD can be used. I may be more interested in the general scheme of how (if at all) such a cycle can be transformed, in contrast to a specific implementation.

+3
1

, , () , (b) -, (c) x86-64 FPU , SIMD.

0

Source: https://habr.com/ru/post/1775759/


All Articles