Why is calling vector.reserve (required + 1) faster than vector.reserve (required)?

I am doing some tests measuring the performance of standard containers under various conditions, and I came across something strange. When I insert a lot of elements in the middle of std::vector , if I first reserve a call with the exact number of elements that I will add, I see, in fact, there is no difference in performance in most cases compared to a non-invoking reserve, which is surprising However, it is even more surprising that if I name the reserve with the exact number of elements that I need + 1 , then I get a significant increase in performance. This is an example of the result table I just got (all the time in seconds):

 +---------------+--------+-------------------+-----------------------+ | # of elements | vector | vector (reserved) | vector (reserved + 1) | +---------------+--------+-------------------+-----------------------+ | 10000 | 0.04 | 0.04 | 0.03 | | 20000 | 0.14 | 0.14 | 0.11 | | 30000 | 0.32 | 0.32 | 0.25 | | 40000 | 0.55 | 0.55 | 0.44 | | 50000 | 0.87 | 0.85 | 0.66 | | 60000 | 1.24 | 1.24 | 0.96 | | 70000 | 1.69 | 1.68 | 1.31 | | 80000 | 2.17 | 2.21 | 1.71 | | 90000 | 2.78 | 2.75 | 2.16 | | 100000 | 3.43 | 3.44 | 2.68 | | 110000 | 4.13 | 4.15 | 3.23 | | 120000 | 4.88 | 4.89 | 3.86 | | 130000 | 5.79 | 5.8 | 4.51 | | 140000 | 6.71 | 6.71 | 5.24 | | 150000 | 7.7 | 7.7 | 6.02 | | 160000 | 8.76 | 8.67 | 6.86 | | 170000 | 9.9 | 9.91 | 7.74 | | 180000 | 11.07 | 10.98 | 8.64 | | 190000 | 12.34 | 12.35 | 9.64 | | 200000 | 13.64 | 13.56 | 10.72 | | 210000 | 15.1 | 15.04 | 11.67 | | 220000 | 16.59 | 16.41 | 12.89 | | 230000 | 18.05 | 18.06 | 14.13 | | 240000 | 19.64 | 19.74 | 15.36 | | 250000 | 21.34 | 21.17 | 16.66 | | 260000 | 23.08 | 23.06 | 18.02 | | 270000 | 24.87 | 24.89 | 19.42 | | 280000 | 26.5 | 26.58 | 20.9 | | 290000 | 28.51 | 28.69 | 22.4 | | 300000 | 30.69 | 30.74 | 23.97 | | 310000 | 32.73 | 32.81 | 25.57 | | 320000 | 34.63 | 34.99 | 27.28 | | 330000 | 37.12 | 37.17 | 28.99 | | 340000 | 39.36 | 39.43 | 30.83 | | 350000 | 41.7 | 41.48 | 32.45 | | 360000 | 44.11 | 44.22 | 34.55 | | 370000 | 46.62 | 46.71 | 36.22 | | 380000 | 49.09 | 48.91 | 38.46 | | 390000 | 51.71 | 51.98 | 40.22 | | 400000 | 54.45 | 54.56 | 43.03 | | 410000 | 57.23 | 57.29 | 44.84 | | 420000 | 60 | 59.73 | 46.67 | | 430000 | 62.9 | 63.03 | 49.3 | +---------------+--------+-------------------+-----------------------+ 

I checked the implementation and it does not seem to have a "one by one" error. Then I additionally tested, printing the size and capacity immediately after calling the reserve, and then I printed them again after filling the vector, and everything looks good.

 before: size: 0 capacity: 10000 after: size: 10000 capacity: 10000 before: size: 0 capacity: 20000 after: size: 20000 capacity: 20000 ... 

The compiler is gcc 4.7.2 on Fedora Linux x86_64. Compiler options -std=c++11 -Ofast -march=native -funsafe-loop-optimizations -flto=4 - fwhole-program

The code is below.

 #include <algorithm> #include <array> #include <cstdint> #include <vector> #include <random> #include <string> #include <iostream> #include <fstream> #include <boost/timer.hpp> namespace { constexpr size_t array_size = 1; unsigned number() { static std::random_device rd; static std::mt19937 random_engine(rd()); static std::uniform_int_distribution<uint32_t> distribution(0, std::numeric_limits<uint32_t>::max()); return distribution(random_engine); } class Class { public: Class() { x[0] = number(); } std::string to_string() const { return std::to_string(x[0]); } inline friend bool operator<=(Class const & lhs, Class const & rhs) { return lhs.x[0] <= rhs.x[0]; } private: std::array<uint32_t, array_size> x; }; template<typename Container> void add(Container & container, Class const & value) { auto const it = std::find_if(std::begin(container), std::end(container), [&](Class const & c) { return value <= c; }); container.emplace(it, value); } // Do something with the result template<typename Container> void insert_to_file(Container const & container) { std::fstream file("file.txt"); for (auto const & value : container) { file << value.to_string() << '\n'; } } template<typename Container> void f(std::vector<Class> const & values) { Container container; container.reserve(values.size()); for (auto const & value : values) { add(container, value); } insert_to_file(container); } } int main(int argc, char ** argv) { std::size_t const size = (argc == 1) ? 1 : std::stoul(argv[1]); // Default constructor of Class fills in values here std::vector<Class> const values_to_be_copied(size); typedef std::vector<Class> Container; boost::timer timer; f<Container>(values_to_be_copied); std::cerr << "Finished in " << timer.elapsed() << " seconds.\n"; } 

I created a C ++ 03 version to try to help other people reproduce it, but I cannot reproduce it in this version, although I tried to show it, making it as direct as possible:

 #include <algorithm> #include <cstdlib> #include <fstream> #include <vector> #include <string> #include <iostream> #include <boost/array.hpp> #include <boost/cstdint.hpp> #include <boost/lexical_cast.hpp> #include <boost/random/mersenne_twister.hpp> #include <boost/random/uniform_int_distribution.hpp> #include <boost/timer.hpp> namespace { unsigned number() { static boost::random::mt19937 random_engine; static boost::random::uniform_int_distribution<boost::uint32_t> distribution(0, std::numeric_limits<boost::uint32_t>::max()); return distribution(random_engine); } class Class { public: Class() { x[0] = number(); } inline friend bool operator<=(Class const & lhs, Class const & rhs) { return lhs.x[0] <= rhs.x[0]; } std::string to_string() const { return boost::lexical_cast<std::string>(x[0]); } private: boost::array<boost::uint32_t, 1> x; }; class Less { public: Less(Class const & c): value(c) { } bool operator()(Class const & c) const { return value <= c; } private: Class value; }; void add(std::vector<Class> & container, Class const & value) { std::vector<Class>::iterator it = std::find_if(container.begin(), container.end(), Less(value)); container.insert(it, value); } // Do something with the result void insert_to_file(std::vector<Class> const & container) { std::fstream file("file.txt"); for (std::vector<Class>::const_iterator it = container.begin(); it != container.end(); ++it) { file << it->to_string() << '\n'; } } void f(std::vector<Class> const & values) { std::vector<Class> container; container.reserve(values.size() + 1); for (std::vector<Class>::const_iterator it = values.begin(); it != values.end(); ++it) { add(container, *it); } insert_to_file(container); } } int main(int argc, char ** argv) { std::size_t const size = (argc == 1) ? 1 : boost::lexical_cast<std::size_t>(argv[1]); // Default constructor of Class fills in values here std::vector<Class> const values_to_be_copied(size); boost::timer timer; f(values_to_be_copied); std::cerr << "Finished in " << timer.elapsed() << " seconds.\n"; } 

The line that is currently causing the reserve has been modified to include + 1 or has been completely removed, depending on which test I ran. All this was run from a shell script that started with 10,000 elements and increased to 430,000 elements, moving one version at a time.

My processor is a 4-core Intel i5 processor, and I have 4 gigabytes of memory. I will try to simplify the C ++ 11 code version as much as possible to see if I can isolate the problem.

Does anyone know why reserving another element than I need causes this speed increase?

+47
c ++ performance memory stdvector
Mar 29 '13 at 16:50
source share
2 answers

I made the following modification of the program:

 size_t a = getenv("A") ? 1 : 0; void f(std::vector<Class> const & values) { ... container.reserve(values.size() + a); ... } 

Now the performance is the same (fast), regardless of whether the value is 0 or 1. The conclusion should be that the reservation of an additional element does not affect the performance (which was adopted in the question). In addition, other small changes in the source code or compiler flags change performance between fast and slow, so it seems that in some cases the code optimizer has more luck than in others.

The following implementation of f () triggers the opposite behavior with the same compiler flags, so it is fast when the exact size is reserved and slow when an additional element is reserved:

 template<typename Container> void f(std::vector<Class> const & values) { Container container; container.reserve(values.size()); for (auto it = values.begin(); it != values.end(); ++it) { add(container, *it); } insert_to_file(container); } 
+16
Mar 31 '13 at 19:29
source share

I believe that the difference between the reserve exact size and one additional element in your particular case is absolutely negligible. Moreover, some implementations may truncate the requested size to a certain number (for example, from a force of 2), so there will be no difference.

OTOH is the bottleneck in your program, it seems like something else. You perform two expensive array operations:

  • Find a suitable place to insert an item
  • Insert one in the middle

As you probably noticed, your program depends on random . It is easy to see that in your particular case, if random numbers grow - your program will run faster, OTOH, if they decrease - you will have to perform more "expensive" inserts.

I think subtle code changes can lead to the generation of various random numbers.

0
Mar 31 '13 at 20:04
source share



All Articles