What do gcc auto-vectorize messages mean?

I have code that I would like to run quickly, so I was hoping I could convince gcc (g ++) to insert some of my inner loops. My compiler flags include

-O3 -msse2 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=5 

but gcc cannot vectorize the most important loops, providing me with the following not-very-very-verbose-at-all messages:

 Not vectorized: complicated access pattern. 

and

 Not vectorized: unsupported use in stmt. 

My questions are: (1) what exactly do they mean? (How complicated should it be before it's too complicated? Unsupported use of what exactly?), And (2) is there any way to get the compiler to give me at least a little more information about what I am not doing So?

An example of a loop that provides a “complex access pattern” is

 for (int s=0;s<N;++s) a.grid[s][0][h-1] = D[s] * (b.grid[s][0][h-2] + b.grid[s][1][h-1] - 2*b.grid[s][0][h-1]); 

and one that gives "unsupported use in stmt" is an inner loop

 for (int s=0;s<N;++s) for (int i=1;i<w-1;++i) for (int j=1;j<h-1;++j) a.grid[s][i][j] = D[s] * (b.grid[s][i][j-1] + b.grid[s][i][j+1] + b.grid[s][i-1][j] + b.grid[s][i+1][j] - 4*b.grid[s][i][j]); 

(This is the one that really needs optimization). Here a.grid and b.grid are three-dimensional arrays of floats, D is a 1D array of floats, and N, w and h are const ints.

+4
source share
1 answer

Not vectorized: complex access pattern.

“Uncomplicated” access patterns are sequential access elements or access to elementary elements with certain restrictions (one element of the group that is accessed in the loop, the number of elements in the group is power 2, the size of the group is a multiple of a vector).

 b.grid[s][0][h-2] + b.grid[s][1][h-1] - 2*b.grid[s][0][h-1]); 

Neither sequential nor strict access

Not in vector: unsupported use in stmt.

Here "use" is in the sense of a data stream, getting the value of a variable (register, compiler temporarily). In this case, “supported uses” are variables defined in the current iteration of the loop, constants, and loop invariants.

 a.grid[s][i][j] = D[s] * (b.grid[s][i][j-1] + b.grid[s][i][j+1] + b.grid[s][i-1][j] + b.grid[s][i+1][j] - 4*b.grid[s][i][j]); 

In this example, I think that “unsupported use” is due to the fact that b.grid[s][i][j-1] and b.grid[s][i][j+1] are assigned (“defined” ) previous iteration of the loop.

+3
source

Source: https://habr.com/ru/post/1447646/


All Articles