Quickly insert values ​​into a map with increasing integer as a key?

The effectiveness of map::insert(iterator position, const value& k) can be significantly improved by providing the appropriate value at the parameter position.

If I use integers as a key, and each insertion is performed with a number greater than all previously inserted keys, can I speed up the ::insert operation when providing the iterator ::end() map?

Sort of:

 myMap.insert( myMap.end() , make_pair( next_number , myValue ) ); 

where myMap is of type map<uint64_t,MyType> , and next_number is any increasing large integer.

Edit:

The answer to this question may vary depending on whether the data stored in the map dense or not (see discussion below). So, let's ask the question in both directions: as soon as it becomes dense, as soon as this happens. Still curious. Perhaps the measurement will answer him.

+6
source share
4 answers

To answer the question asked, C ++ specifications say that:

  • In C ++ 03, inserting into a card with a.insert(p,t) should be amortized with constant complexity (and not logarithmic) if t inserted immediately after p .
  • In C ++ 11, inserting into a card with a.insert(p,t) must be amortized with constant complexity if t inserted immediately before p .

and in any case p must be sought. So in your case, a.end() is likely to be the best allusion to C ++ 11, but not to C ++ 03.

+5
source

I would suggest two things:

  • prefers std::unordered_map in this case, always inserting at one end the worst-case scenario for red-black trees
  • use a custom allocator, if new proves to be a problem for you, from what you say about the pool allocation strategy, you can use

Please note that C ++ 11 allows you to use stateful generators, so it should be simple enough to provide a dispenser that fits and has a built-in std::vector<T> inside and use it as a stack.

+2
source

Any proposal is just a proposal, to try and measure something. We cannot tell you the most effective way to do the insertion; you must measure your own specific use case and see which is best.

If your map is compact and dense (almost all elements from the 0 - max key are occupied by real data), and the maximum key is small enough to be a reasonable index of the array, you can switch to using std::vector<value> and always insert at the end. Since it grows, you will have to redistribute the vector (usually this is when the vector doubles). It can be expensive, but usually the insert will be very cheap. You do not need to deal with the potential rebalancing of the binary tree, and the vector is extremely convenient for other purposes.

If your key card space is not compact / dense, and the maximum key is so large that its inconceivable index in memory, then a hint insert will be your best choice.

If the order doesn't matter, you can try std :: unordered_map . This is a hash table implementation. Thus, the cost of insertion will be related to the quality and speed of the hash. It should be trivial and quick to take your 64-bit key and turn it into a hash file size_t (size_t may even be 64 bits).

But do not take a word for him, measure it and see for yourself ...

+1
source

I took some measurements since I recently met this problem.

I have a large map with a lot of data, data is rarely inserted, 99% of the time is accessible and modified only using links. However, this data must ultimately be saved to disk and downloaded back. Solutions like "using an unordered map" seem like a cheap quick way to do it wrong, an ordered map was the right way for me, because the data is ordered. Only problem was loading from file.

I wanted to know what the real cost of this operation is and how to speed it up, so I measured:

 // Example program #include <iostream> #include <string> #include <map> #include <vector> #include <time.h> std::vector<int> amount = {100, 1000, 10000, 100000, 1000000, 5000000}; int main() { for(int j=0; j<amount.size(); j++) { clock_t tStart = clock(); std::map<int,int> mymap; for(int i=0; i<amount[j]; i++){ mymap[i] = i; } printf("Time taken []: %.2fs\n", (double)(clock() - tStart)); } for(int j=0; j<amount.size(); j++) { clock_t tStart = clock(); std::map<int,int> mymap; mymap[0] = 0; auto it = mymap.begin(); for(int i=1; i<amount[j]; i++){ it = mymap.insert(it, std::pair<int,int>(i,i)); } printf("Time taken insert end()-1: %.2fns\n", (double)(clock() - tStart)); } for(int j=0; j<amount.size(); j++) { clock_t tStart = clock(); std::map<int,int> mymap; for(int i=1; i<amount[j]; i++){ mymap.insert(mymap.end(), std::pair<int,int>(i,i)); } printf("Time taken insert end(): %.2fns\n", (double)(clock() - tStart)); } for(int j=0; j<amount.size(); j++) { clock_t tStart = clock(); std::map<int,int> mymap; for(int i=0; i<amount[j]; i++){ mymap.insert(mymap.begin(), std::pair<int,int>(i,i)); } printf("Time taken insert begin(): %.2fs\n", (double)(clock() - tStart)); } return 0; } 

Results:

 Time in ns N end()-1 end() begin() [] 100 12 8 22 12 1000 77 54 188 97 10000 763 532 2550 1174 100000 7609 6042 23612 17164 1000000 75561 62048 270476 272099 5000000 362463 306412 1827807 1687904 

enter image description here enter image description here

Summary:

  • YES there is a gain, a huge gain, without any real flaw. Extremely better than an unordered map, when the data is ordered, it is extremely useful for saving and re-creating the map in a file.

  • The insertion time, if the prompt is correct, is the same regardless of the number of elements. Thus, it is not necessary to repeat the hashing of an unordered map in order to have a constant time.

  • In the worst case scenario, you may lose some if your hint is the worst hint. I no longer see the point of making inserts without a hint, especially if you have knowledge of where the data will be inserted. And most of the time you do.

0
source

Source: https://habr.com/ru/post/917402/


All Articles