How to hash unordered_map?

boost::hash has boost::hash functions for most built-in types, including containers.

But, as indicated in the description of the boost::hash_range , the hash algorithm for ranges

sensitive to the order of the elements, so it would be impractical to use it with an unordered container

And therefore, for std::unordered_map and boost::unordered_map there is no specialization boost::hash .


The question arises:

Is there a "simple and efficient" way to hash unordered_map without overriding the hashing algorithm from scratch?

+6
source share
5 answers

The problem here is that there is no guarantee that the elements even have an order between them.
Thus, sorting items may very well not work for arbitrary unordered containers. You have 2 options:

  • Just XOR hashes of all the individual elements. This is the fastest.
  • First, collect the hashes of the containers, and then the hash of them. This can lead to improved hash.
+6
source

You can, of course, convert unordered_map to another data structure that has a guaranteed order and use it to generate a hash.

A better idea would be to hash each individual element of the map, put these hashes in vector , and then sort and combine the hashes. See for example How to combine hash values ​​in C ++ 0x? to combine hashes.

 template<typename Hash, typename Iterator> size_t order_independent_hash(Iterator begin, Iterator end, Hash hasher) { std::vector<size_t> hashes; for (Iterator it = begin; it != end; ++it) hashes.push_back(hasher(*it)); std::sort(hashes.begin(), hashes.end()); size_t result = 0; for (auto it2 = hashes.begin(); it2 != hashes.end(); ++it2) result ^= *it2 + 0x9e3779b9 + (result<<6) + (result>>2); return result; } 

Testing this on shuffled vectors shows that it always returns the same hash.

Now, to adapt this basic concept specifically for working with unordered_map . Since the unordered_map iterator returns a pair , we also need a hash function.

 namespace std { template<typename T1, typename T2> struct hash<std::pair<T1,T2> > { typedef std::pair<T1,T2> argument_type; typedef std::size_t result_type; result_type operator()(argument_type const& s) const { result_type const h1 ( std::hash<T1>()(s.first) ); result_type const h2 ( std::hash<T2>()(s.second) ); return h1 ^ (h2 + 0x9e3779b9 + (h1<<6) + (h1>>2)); } }; template<typename Key, typename T> struct hash<std::unordered_map<Key,T> > { typedef std::unordered_map<Key,T> argument_type; typedef std::size_t result_type; result_type operator()(argument_type const& s) const { return order_independent_hash(s.begin(), s.end(), std::hash<std::pair<Key,T> >()); } }; } 

See in action: http://ideone.com/WOLFbc

+1
source

I think you can confuse what the hash is for. These are the keys used to identify items to determine where to store them. Two equivalent elements must have the same value.

Are you trying to see if two unordered cards are equivalent and store them in some kind of container?

The keys to an unordered map are, well, hashed. In fact, the container would be named hash_map, except that such a container already existed.

But well, suppose you really want to keep disordered cards and compare them if they are equivalent. Well, you have to come up with a hashing algorithm that returns the same value regardless of the position of the elements it contains. A checksum of all its elements (keys and values) would be one of the possible ways.

Note also that just because two elements have the same hash value does not mean that they are equivalent. It just means that if the hash value is different, they are definitely not equivalent. In fact, checksums are often used to validate data for precisely this reason. An incorrect checksum is evidence that the data is invalid, and given the good formula, the correct option makes it very likely, although not sure if it is.

0
source

I'm curious that you are trying to use the unordered_map hash to use it as a key, and given that as soon as you haveh the unordered_map , you won’t change it (unless you use it to create a new key), will the unordered_map conversion be acceptable for performance into an ordered map (and then, of course, hashing an ordered map and using this as a key)? Or is the problem with this approach that you need the faster search time provided by unordered_map ?

For what it's worth, there might be an advantage in space for using an ordered map (based on the accepted answer, the following unordered_map message usually uses more memory):

Is there any advantage of using a map over unordered_map in the case of trivial keys?

0
source

You did not specify any performance requirements, but if you just need a “quick and dirty” solution that does not require a lot of coding on your behalf and uses boost::hash , you can copy a range of elements from unordered_map to a vector , std::sort vector, and then pass it to boost::hash_range .

It is hardly the most effective solution, but not the one that you would like to use often or with many elements.

My preferred approach would be the unordered_map specialization, which stores the current, current hash of the content - you do not need to pass all the elements and perform calculations to get the current value. Instead, a member of the data structure should reflect the hash and be modified in real time as elements are inserted or deleted and read as necessary.

0
source

Source: https://habr.com/ru/post/973656/


All Articles