I have code that I would like to run quickly, so I was hoping I could convince gcc (g ++) to insert some of my inner loops. My compiler flags include
-O3 -msse2 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=5
but gcc cannot vectorize the most important loops, providing me with the following not-very-very-verbose-at-all messages:
Not vectorized: complicated access pattern.
and
Not vectorized: unsupported use in stmt.
My questions are: (1) what exactly do they mean? (How complicated should it be before it's too complicated? Unsupported use of what exactly?), And (2) is there any way to get the compiler to give me at least a little more information about what I am not doing So?
An example of a loop that provides a “complex access pattern” is
for (int s=0;s<N;++s) a.grid[s][0][h-1] = D[s] * (b.grid[s][0][h-2] + b.grid[s][1][h-1] - 2*b.grid[s][0][h-1]);
and one that gives "unsupported use in stmt" is an inner loop
for (int s=0;s<N;++s) for (int i=1;i<w-1;++i) for (int j=1;j<h-1;++j) a.grid[s][i][j] = D[s] * (b.grid[s][i][j-1] + b.grid[s][i][j+1] + b.grid[s][i-1][j] + b.grid[s][i+1][j] - 4*b.grid[s][i][j]);
(This is the one that really needs optimization). Here a.grid and b.grid are three-dimensional arrays of floats, D is a 1D array of floats, and N, w and h are const ints.
source share