I am going to make my code more generalized using std::tuple in many cases, including one element. I mean, for example, tuple<double> instead of double . But I decided to test the performance of this particular case.
Here is a simple performance test:
#include <tuple> #include <iostream> using std::cout; using std::endl; using std::get; using std::tuple; int main(void) { #ifdef TUPLE using double_t = std::tuple<double>; #else using double_t = double; #endif constexpr int count = 1e9; auto array = new double_t[count]; long long sum = 0; for (int idx = 0; idx < count; ++idx) { #ifdef TUPLE sum += get<0>(array[idx]); #else sum += array[idx]; #endif } delete[] array; cout << sum << endl; // just "external" side effect for variable sum. }
And follow the results:
$ g++ -DTUPLE -O2 -std=c++11 test.cpp && time ./a.out 0 real 0m3.347s user 0m2.839s sys 0m0.485s $ g++ -O2 -std=c++11 test.cpp && time ./a.out 0 real 0m2.963s user 0m2.424s sys 0m0.519s
I thought that a tuple is a strict static compiled template, and all get <> functions work in this case with regular variable access. The dimensions of the BTW memory distribution are the same in this test. Why does this run-time difference occur?
EDIT: The problem was initializing the tuple <> object. To make the test more accurate, you need to change one line:
constexpr int count = 1e9; - auto array = new double_t[count]; + auto array = new double_t[count](); long long sum = 0;
After that, you can observe similar results:
$ g++ -DTUPLE -g -O2 -std=c++11 test.cpp && (for i in $(seq 3); do time ./a.out; done) 2>&1 | grep real real 0m3.342s real 0m3.339s real 0m3.343s $ g++ -g -O2 -std=c++11 test.cpp && (for i in $(seq 3); do time ./a.out; done) 2>&1 | grep real real 0m3.349s real 0m3.339s real 0m3.334s
source share