I am doing some tests measuring the performance of standard containers under various conditions, and I came across something strange. When I insert a lot of elements in the middle of std::vector , if I first reserve a call with the exact number of elements that I will add, I see, in fact, there is no difference in performance in most cases compared to a non-invoking reserve, which is surprising However, it is even more surprising that if I name the reserve with the exact number of elements that I need + 1 , then I get a significant increase in performance. This is an example of the result table I just got (all the time in seconds):
+---------------+--------+-------------------+-----------------------+ | # of elements | vector | vector (reserved) | vector (reserved + 1) | +---------------+--------+-------------------+-----------------------+ | 10000 | 0.04 | 0.04 | 0.03 | | 20000 | 0.14 | 0.14 | 0.11 | | 30000 | 0.32 | 0.32 | 0.25 | | 40000 | 0.55 | 0.55 | 0.44 | | 50000 | 0.87 | 0.85 | 0.66 | | 60000 | 1.24 | 1.24 | 0.96 | | 70000 | 1.69 | 1.68 | 1.31 | | 80000 | 2.17 | 2.21 | 1.71 | | 90000 | 2.78 | 2.75 | 2.16 | | 100000 | 3.43 | 3.44 | 2.68 | | 110000 | 4.13 | 4.15 | 3.23 | | 120000 | 4.88 | 4.89 | 3.86 | | 130000 | 5.79 | 5.8 | 4.51 | | 140000 | 6.71 | 6.71 | 5.24 | | 150000 | 7.7 | 7.7 | 6.02 | | 160000 | 8.76 | 8.67 | 6.86 | | 170000 | 9.9 | 9.91 | 7.74 | | 180000 | 11.07 | 10.98 | 8.64 | | 190000 | 12.34 | 12.35 | 9.64 | | 200000 | 13.64 | 13.56 | 10.72 | | 210000 | 15.1 | 15.04 | 11.67 | | 220000 | 16.59 | 16.41 | 12.89 | | 230000 | 18.05 | 18.06 | 14.13 | | 240000 | 19.64 | 19.74 | 15.36 | | 250000 | 21.34 | 21.17 | 16.66 | | 260000 | 23.08 | 23.06 | 18.02 | | 270000 | 24.87 | 24.89 | 19.42 | | 280000 | 26.5 | 26.58 | 20.9 | | 290000 | 28.51 | 28.69 | 22.4 | | 300000 | 30.69 | 30.74 | 23.97 | | 310000 | 32.73 | 32.81 | 25.57 | | 320000 | 34.63 | 34.99 | 27.28 | | 330000 | 37.12 | 37.17 | 28.99 | | 340000 | 39.36 | 39.43 | 30.83 | | 350000 | 41.7 | 41.48 | 32.45 | | 360000 | 44.11 | 44.22 | 34.55 | | 370000 | 46.62 | 46.71 | 36.22 | | 380000 | 49.09 | 48.91 | 38.46 | | 390000 | 51.71 | 51.98 | 40.22 | | 400000 | 54.45 | 54.56 | 43.03 | | 410000 | 57.23 | 57.29 | 44.84 | | 420000 | 60 | 59.73 | 46.67 | | 430000 | 62.9 | 63.03 | 49.3 | +---------------+--------+-------------------+-----------------------+
I checked the implementation and it does not seem to have a "one by one" error. Then I additionally tested, printing the size and capacity immediately after calling the reserve, and then I printed them again after filling the vector, and everything looks good.
before: size: 0 capacity: 10000 after: size: 10000 capacity: 10000 before: size: 0 capacity: 20000 after: size: 20000 capacity: 20000 ...
The compiler is gcc 4.7.2 on Fedora Linux x86_64. Compiler options -std=c++11 -Ofast -march=native -funsafe-loop-optimizations -flto=4 - fwhole-program
The code is below.
#include <algorithm> #include <array> #include <cstdint> #include <vector> #include <random> #include <string> #include <iostream> #include <fstream> #include <boost/timer.hpp> namespace { constexpr size_t array_size = 1; unsigned number() { static std::random_device rd; static std::mt19937 random_engine(rd()); static std::uniform_int_distribution<uint32_t> distribution(0, std::numeric_limits<uint32_t>::max()); return distribution(random_engine); } class Class { public: Class() { x[0] = number(); } std::string to_string() const { return std::to_string(x[0]); } inline friend bool operator<=(Class const & lhs, Class const & rhs) { return lhs.x[0] <= rhs.x[0]; } private: std::array<uint32_t, array_size> x; }; template<typename Container> void add(Container & container, Class const & value) { auto const it = std::find_if(std::begin(container), std::end(container), [&](Class const & c) { return value <= c; }); container.emplace(it, value); } // Do something with the result template<typename Container> void insert_to_file(Container const & container) { std::fstream file("file.txt"); for (auto const & value : container) { file << value.to_string() << '\n'; } } template<typename Container> void f(std::vector<Class> const & values) { Container container; container.reserve(values.size()); for (auto const & value : values) { add(container, value); } insert_to_file(container); } } int main(int argc, char ** argv) { std::size_t const size = (argc == 1) ? 1 : std::stoul(argv[1]); // Default constructor of Class fills in values here std::vector<Class> const values_to_be_copied(size); typedef std::vector<Class> Container; boost::timer timer; f<Container>(values_to_be_copied); std::cerr << "Finished in " << timer.elapsed() << " seconds.\n"; }
I created a C ++ 03 version to try to help other people reproduce it, but I cannot reproduce it in this version, although I tried to show it, making it as direct as possible:
#include <algorithm> #include <cstdlib> #include <fstream> #include <vector> #include <string> #include <iostream> #include <boost/array.hpp> #include <boost/cstdint.hpp> #include <boost/lexical_cast.hpp> #include <boost/random/mersenne_twister.hpp> #include <boost/random/uniform_int_distribution.hpp> #include <boost/timer.hpp> namespace { unsigned number() { static boost::random::mt19937 random_engine; static boost::random::uniform_int_distribution<boost::uint32_t> distribution(0, std::numeric_limits<boost::uint32_t>::max()); return distribution(random_engine); } class Class { public: Class() { x[0] = number(); } inline friend bool operator<=(Class const & lhs, Class const & rhs) { return lhs.x[0] <= rhs.x[0]; } std::string to_string() const { return boost::lexical_cast<std::string>(x[0]); } private: boost::array<boost::uint32_t, 1> x; }; class Less { public: Less(Class const & c): value(c) { } bool operator()(Class const & c) const { return value <= c; } private: Class value; }; void add(std::vector<Class> & container, Class const & value) { std::vector<Class>::iterator it = std::find_if(container.begin(), container.end(), Less(value)); container.insert(it, value); } // Do something with the result void insert_to_file(std::vector<Class> const & container) { std::fstream file("file.txt"); for (std::vector<Class>::const_iterator it = container.begin(); it != container.end(); ++it) { file << it->to_string() << '\n'; } } void f(std::vector<Class> const & values) { std::vector<Class> container; container.reserve(values.size() + 1); for (std::vector<Class>::const_iterator it = values.begin(); it != values.end(); ++it) { add(container, *it); } insert_to_file(container); } } int main(int argc, char ** argv) { std::size_t const size = (argc == 1) ? 1 : boost::lexical_cast<std::size_t>(argv[1]); // Default constructor of Class fills in values here std::vector<Class> const values_to_be_copied(size); boost::timer timer; f(values_to_be_copied); std::cerr << "Finished in " << timer.elapsed() << " seconds.\n"; }
The line that is currently causing the reserve has been modified to include + 1 or has been completely removed, depending on which test I ran. All this was run from a shell script that started with 10,000 elements and increased to 430,000 elements, moving one version at a time.
My processor is a 4-core Intel i5 processor, and I have 4 gigabytes of memory. I will try to simplify the C ++ 11 code version as much as possible to see if I can isolate the problem.
Does anyone know why reserving another element than I need causes this speed increase?