Intel compiler cannot vectorize this simple loop?

So, I have the following code, which seems very simple to me:

#define MODS_COUNT 5 int start1 = <calc at runtime>; int start2 = <calc at runtime>; for (int j=0; j<MODS_COUNT; j++) // loop 5 times doing simple addition. logModifiers[start1 + j] += logModsThis[start2 + j]; 

This loop is part of the outer loop (not sure if it matters)

The compiler says: message : loop was not vectorized: vectorization possible but seems inefficient.

Why can't this cycle be vectorized? It seems to me very simple. How can I force a vector to test performance?

I have an Intel C ++ 2013 compiler update.

The full code is here if anyone is interested: http://pastebin.com/Z6H5ZejW

Edit: I understand that the compiler decided that it was inefficient. I'm asking:

Why is this inefficient?

How can I make him so that I can compare myself?

Edit2: If I change it to 4 instead of 5, then it will be vectorized. What makes 5 ineffective? I thought that this can be done in 2 instructions, the first is 4, and the second is "normal", 1 instead of 5 instructions.

+6
source share
1 answer

According to vectorization in Intel compilers:

There are SIMD registers (separate instructions with several data), the length of which is 128 bytes. therefore, if sizeof (int) is 4 , then integers 4 can sit in these registers, and one command can execute these 4 int s (this also depends on whether the same type of operations are performed on these int s, here it is true more for each element of the array on LHS depends on another element of another array.)

if there is 8 int , then two instructions are required (instead of 8 without vectorization).

but if there is 5 (or 6 or 7) int , it will require two instructions. which might no better than without a vectorization code.

further reading LINK .

+2
source

Source: https://habr.com/ru/post/943954/


All Articles