Virtual function returning a small structure - return value and output parameter?

I have a virtual function in hotspot code that should return a structure as a result. I have two options:

virtual Vec4 generateVec() const = 0; // return value virtual void generateVec(Vec4& output) const = 0; // output parameter 

My question is, is there any difference in performance between these functions? I guess the second one is faster because it does not involve copying data onto the stack. However, the first one is often much more convenient to use. If the first is still a little slower, will it be generally measurable? I'm too obsessed :)

We emphasize that this function will be called millions of times per second, but also that the size of the Vec4 structure is small - 16 bytes.

+6
source share
5 answers

As already mentioned, try them, but you will most likely find that Vec4 generateVec() is actually faster. Optimizing the return value will perform a copy operation , whereas void generateVec(Vec4& output) may cause the output parameter to be unnecessarily initialized.

Is there a way to avoid a virtual function? If you call it millions of times per second, what you should pay attention to an additional level of indirection.

+6
source

The code, called millions of times per second, implies that you really need to optimize speed.

Depending on how complex the body of the generated generateVec derivative is, the difference between them may be imperceptible or may be massive.

It is best to try them and the profile, and see if you need to worry about optimizing this particular aspect of the code.

+2
source

Feeling a little boring, I came up with this:

 #include <iostream> #include <ctime> #include <cstdlib> using namespace std; struct A { int n[4]; A() { n[0] = n[1] = n[2] = n[3] = rand(); } }; A f1() { return A(); } A f2( A & a ) { a = A(); } const unsigned long BIG = 100000000; int main() { unsigned int sum = 0; A a; clock_t t = clock(); for ( unsigned int i = 0; i < BIG; i++ ) { a = f1(); sum += an[0]; } cout << clock() - t << endl; t = clock(); for ( unsigned int i = 0; i < BIG; i++ ) { f2( a ); sum += an[0]; } cout << clock() - t << endl; return sum & 1; } 

The results with -O2 optimization are that there is no significant difference.

0
source

There is a chance that the first solution will be faster.

Very nice article:

http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/

0
source

Just out of curiosity, I wrote two similar functions (uses 8-byte data types) to check their assembly code.

 long long int ret_val() { long long int tmp(1); return tmp; } // ret_val() assembly .globl _Z7ret_valv .type _Z7ret_valv, @function _Z7ret_valv: .LFB0: .cfi_startproc .cfi_personality 0x0,__gxx_personality_v0 pushl %ebp .cfi_def_cfa_offset 8 movl %esp, %ebp .cfi_offset 5, -8 .cfi_def_cfa_register 5 subl $16, %esp movl $1, -8(%ebp) movl $0, -4(%ebp) movl -8(%ebp), %eax movl -4(%ebp), %edx leave ret .cfi_endproc 

Surprisingly , the pass-by-value method below required a few more instructions:

 void output_val(long long int& value) { long long int tmp(2); value = tmp; } // output_val() assembly .globl _Z10output_valRx .type _Z10output_valRx, @function _Z10output_valRx: .LFB1: .cfi_startproc .cfi_personality 0x0,__gxx_personality_v0 pushl %ebp .cfi_def_cfa_offset 8 movl %esp, %ebp .cfi_offset 5, -8 .cfi_def_cfa_register 5 subl $16, %esp movl $2, -8(%ebp) movl $0, -4(%ebp) movl 8(%ebp), %ecx movl -8(%ebp), %eax movl -4(%ebp), %edx movl %eax, (%ecx) movl %edx, 4(%ecx) leave ret .cfi_endproc 

These functions were called in test code as:

  long long val = ret_val(); long long val2; output_val(val2); 

Compiled by gcc.

0
source

Source: https://habr.com/ru/post/889383/


All Articles