Indeed, it is expected that the first version will be faster. The reason is that:
auto row(m_bufvec);
calls a copy of constuctor, which immediately allocates the necessary memory for row . bufvec also saves allocated memory. As a result, reductions for each element are minimized, and this is important because they are related to the number of movements.
In the second version, auto row(std::move(m_bufvec)); bufvec memory becomes the owner of row ; this operation is faster than the copy constructor. But since bufvec lost the allocated memory when you later fill it with an element by element, it will do a lot of redistribution and (expensive) relocation. The number of redistributions is usually logarithmic with a finite vector size.
EDIT
The above explains the "unexpected" results in the main question. Finally, it turns out that the βidealβ for this operation should move immediately:
auto row(std::move(m_bufvec); m_bufvec.reserve(row.size()); return row;
Achieving three goals:
no distribution by elements
no useless initialization for bufvec
there is no useless copying of elements from m_bufvec to row .
source share