Why the loops in the built-in function cannot be auto-vectorized correctly?

Question

Why the loops in the built-in function cannot be auto-vectorized correctly?

I am trying to vectorize some simple calculations to speed up work with the SIMD architecture. However, I also want to use them as an inline function, since function calls and non-vectorized codes also take up computational time. However, I cannot always achieve them at the same time. In fact, most of my built-in functions do not receive auto-vectorization. Here is a simple test code that works:

inline void add1(double *v, int Length) {
    for(int i=0; i < Length; i++) v[i] += 1;
}

void call_add1(double v[], int L) {
    add1(v, L);
}

int main(){return 0;}

On Mac OS X 10.12.3 compile it:

clang++ -O3 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -std=c++11 -ffast-math test.cpp

test.cpp:2:5: remark: vectorized loop (vectorization width: 2, interleaved count: 2) [-Rpass=loop-vectorize]
    for(int i=0; i < Length; i++) v[i] += 1;
    ^

However, something very similar (only moving arguments in call_add1) does not work:

inline void add1(double *v, int Length) {
    for(int i=0; i < Length; i++) v[i] += 1;
}

void call_add1() {
    double v[20]={0,1,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9}; 
    int L=20;
    add1(v, L);
}

int main(){ return 0;}

. ? , -? , , , .

+4

c++ simd clang++ inline

Jiang-Nan Yang 15 . '18 16:55

3

, 2- , https://godbolt.org/g/CnojEi clang 4.0.0 -O3 :

call_add1():
  rep ret
main:
  xor eax, eax
  ret

.

1- , . . , . , const, , , .

, , : https://godbolt.org/g/KF1kNt

+4

luk32 15 . '18 17:14

, , v . : , , .

To check the optimization, you can try to make some of the volatile variables ( live example ).

+3

AMA Jan 15 '18 at 17:14

source share

ivaigult · Accepted Answer · 2018-01-15T17:18:51+0000

-fsave-optimization-record , , .

--- !Passed
Pass:            loop-unroll
Name:            FullyUnrolled
DebugLoc:        { File: main.cpp, Line: 2, Column: 5 }
Function:        _Z9call_add1v
Args:            
  - String:          'completely unrolled loop with '
  - UnrollCount:     '20'
  - String:          ' iterations'
...
--- !Passed
Pass:            gvn
Name:            LoadElim
DebugLoc:        { File: main.cpp, Line: 2, Column: 40 }
Function:        _Z9call_add1v
Args:            
  - String:          'load of type '
  - Type:            double
  - String:          ' eliminated'
  - String:          ' in favor of '
  - InfavorOfValue:  '0.000000e+00'

4000 , , clang .

Why the loops in the built-in function cannot be auto-vectorized correctly?

More articles: