As part of a self-education project, I looked at how g ++ handles a type std::complex
- and was puzzled by this simple function:
#include <complex>
std::complex<double> c;
void get(std::complex<double> &res){
res=c;
}
Compiled with g++-6.3 -O3
(or also -Os
) for Linux64, I got this result:
movsd c(%rip), %xmm0
movsd %xmm0, (%rdi)
movsd c+8(%rip), %xmm0
movsd %xmm0, 8(%rdi)
ret
Thus, it moves the real and imaginary parts separately, like 64-bit floats. However, I would expect the assembly to use two movups
instead of four movsd
, i.e. Move the real and imaginary parts simultaneously in the form of a 128-bit packet:
movups c(%rip), %xmm0
movups %xmm0, (%rdi)
ret
It is not only twice as fast on my machine (Intel Broadwell) than in the version movsd
, but it also only needs 16 bytes, and for movsd
-version it takes 36 bytes.
What is the reason g ++ builds assembly with movsd
?
movups
, -O3
?movups
?- g++ ?
- - ?
: :
std::complex<double> get(){
return c;
}
void get(std::complex<double> &res){
res=c;
}
(xmm0
xmm1
) - SystemV ABI. SSE-, 128 , g++.
: kennytm, g++, , . 4 movsd std:: complex , ,
void get(std::complex<double> *res){
res[1]=res[0];
}
, gcc-bugzilla..