In what situation am I std :: map <A, B> will be faster than the sorted std :: vector <std :: pair <A, B >>?

I used map in some code to store ordered data. I learned that for huge maps, destruction can take some time. In this code, I replaced map with vector<pair> reduced processing time of 10,000 ...

Finally, I was so surprised that I decided to compare the map metrics with the sorted vector or pair .

And I am surprised because I could not find a situation where the map was faster than the sorted vector from pair (filled randomly and later sorted) ... there should be situations in which map faster ... still what is the point in providing this class ?

Here is what I tested:

Test, compare map filling and destruction vs vector filling, sorting (because I want a sorted container) and destruction:

 #include <iostream> #include <time.h> #include <cstdlib> #include <map> #include <vector> #include <algorithm> int main(void) { clock_t tStart = clock(); { std::map<float,int> myMap; for ( int i = 0; i != 10000000; ++i ) { myMap[ ((float)std::rand()) / RAND_MAX ] = i; } } std::cout << "Time taken by map: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; tStart = clock(); { std::vector< std::pair<float,int> > myVect; for ( int i = 0; i != 10000000; ++i ) { myVect.push_back( std::make_pair( ((float)std::rand()) / RAND_MAX, i ) ); } // sort the vector, as we want a sorted container: std::sort( myVect.begin(), myVect.end() ); } std::cout << "Time taken by vect: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; return 0; } 

Compiled with g++ main.cpp -O3 -o main and got:

 Time taken by map: 21.7142 Time taken by vect: 7.94725 

map 3 times slower ...

Then I said: “OK, the vector is faster to fill and sort, but the search will be faster using the map” .... so I tested:

 #include <iostream> #include <time.h> #include <cstdlib> #include <map> #include <vector> #include <algorithm> int main(void) { clock_t tStart = clock(); { std::map<float,int> myMap; float middle = 0; float last; for ( int i = 0; i != 10000000; ++i ) { last = ((float)std::rand()) / RAND_MAX; myMap[ last ] = i; if ( i == 5000000 ) middle = last; // element we will later search } std::cout << "Map created after " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; float sum = 0; for ( int i = 0; i != 10; ++i ) sum += myMap[ last ]; // search it std::cout << "Sum is " << sum << std::endl; } std::cout << "Time taken by map: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; tStart = clock(); { std::vector< std::pair<float,int> > myVect; std::pair<float,int> middle; std::pair<float,int> last; for ( int i = 0; i != 10000000; ++i ) { last = std::make_pair( ((float)std::rand()) / RAND_MAX, i ); myVect.push_back( last ); if ( i == 5000000 ) middle = last; // element we will later search } std::sort( myVect.begin(), myVect.end() ); std::cout << "Vector created after " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; float sum = 0; for ( int i = 0; i != 10; ++i ) sum += (std::find( myVect.begin(), myVect.end(), last ))->second; // search it std::cout << "Sum is " << sum << std::endl; } std::cout << "Time taken by vect: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; return 0; } 

Compiled with g++ main.cpp -O3 -o main and got:

 Map created after 19.5357 Sum is 1e+08 Time taken by map: 21.41 Vector created after 7.96388 Sum is 1e+08 Time taken by vect: 8.31741 

Even the search is apparently faster with vector (10 searches with map took almost 2 seconds and took only half a second with vector ) ....

So:

  • Did I miss something?
  • Are my tests not correct / accurate?
  • Is map just a class to avoid, or are there really situations where map offers good results?
+5
source share
2 answers

As a rule, map will be better when you make a lot of attachments and exceptions, alternating with your searches. If you create a data structure once and then only perform a search, a sorted vector will almost certainly be faster, if only because of the effects of the processor cache. Since inserts and deletes at arbitrary locations in the vector are O (n) instead of O (log n), there will come a point where they become a limiting factor.

+3
source

std::find has linear time complexity, while map search has N. log complexity

When you find that one algorithm is 100,000 faster than another, you should be suspicious! Your test is not valid.

You need to compare realistic options. Perhaps you wanted to compare a map with a binary search. Run each of these options for at least 1 second of processor time so you can really compare the results.

When the benchmark returns a time of "0.00001 seconds", you feel good about inaccuracies. This number does not mean anything.

+1
source

Source: https://habr.com/ru/post/1242910/


All Articles