Creating unordered_set unordered_set

I want to create a container that will store unique sets of integers inside.

I want to create something similar to

std::unordered_set<std::unordered_set<unsigned int>> 

But g ++ doesn't let me do this and says:

 invalid use of incomplete type 'struct std::hash<std::unordered_set<unsigned int> >' 

I want to achieve unique sets of unsigned ints.

How can i do this?

+5
source share
6 answers

I am adding another answer to this question since no one has touched on a key point at this time.

Everyone tells you that you need to create a hash function for unordered_set<unsigned> , and that is correct. You can do this by specializing in std::hash<unordered_set<unsigned>> , or you can create your own functor and use it like this:

 unordered_set<unordered_set<unsigned>, my_unordered_set_hash_functor> s; 

In any case, this is normal. However, there is a big problem you need to keep an eye on:

For any two unordered_set<unsigned> that compare equal ( x == y ), they should hash with the same value: hash(x) == hash(y) . If you do not follow this rule, you will receive runtime errors. Also note that the following two unordered_set compare equals (pseudo-code is used here for clarity):

 {1, 2, 3} == {3, 2, 1} 

Therefore, hash({1, 2, 3}) should equal hash({3, 2, 1}) . Unordered containers, on the other hand, have an equality operator, where order doesn't matter. Thus, however, you create your own hash function, its result should be independent of the order of elements in the container.

Alternatively, you can replace the equality predicate used in unordered_set to keep it in order:

 unordered_set<unordered_set<unsigned>, my_unordered_set_hash_functor, my_unordered_equal> s; 

The burden of obtaining all of this right does:

 unodered_set<set<unsigned>, my_set_hash_functor> 

look pretty attractive. You still need to create a hash functor for set<unsigned> , but now you do not have to worry about getting the same hash code for {1, 2, 3} and {3, 2, 1} . Instead, you should make sure that these hash codes are different.

I note that Walter's answer gives a hash functor that has the correct behavior: it ignores the order when computing the hash code. But then his answer (at present) tells you that this is not a good solution. :-) This is a really good solution for unordered containers. An even better solution would be to return the sum of the individual hashes instead of hashing the sum of the elements.

+6
source

You can do this, but, like every element of type unsorted_set/map , the internal unsorted_set now needs to have a unsorted_set function defined. By default, it does not have it, but you can write it yourself.

+3
source

What you need to do is define an appropriate hash for keys of type std::unordered_set<unsigned int> (since operator== already defined for this key, you will not need to provide an EqualKey template EqualKey for std::unordered_set<std::unordered_set<unsigned int>, Hash, EqualKey> EqualKey std::unordered_set<std::unordered_set<unsigned int>, Hash, EqualKey> .

One simple (albeit inefficient) option is a hash for the total amount of all elements in the set. It will look something like this:

 template<typename T> struct hash_on_sum : private std::hash<typename T::element_type> { typedef T::element_type count_type; typedef std::hash<count_type> base; std::size_t operator()(T const&obj) const { return base::operator()(std::accumulate(obj.begin(),obj.end(),count_type())); } }; typedef std::unordered_set<unsigned int> inner_type; typedef std::unordered_set<inner_type, hash_on_sum<inner_type>> set_of_unique_sets; 

However, although it is simple, it is not good, since it does not guarantee the following requirement. For two different parameters k1 and k2 that are not equal, the probability that std::hash<Key>()(k1) == std::hash<Key>()(k2) should be very small approaches 1.0/std::numeric_limits<size_t>::max() .

+2
source

std::unordered_set<unsigned int>> does not meet the requirement to be an element from std::unordered_set , because there is no default hash function (i.e. std::hash<> does not specialize in std::unordered_set<unsigned int>> ).

you can provide one (it should be fast and avoid collisions as much as possible):

 class MyHash { public: std::size_t operator()(const std::unordered_set<unsigned int>& s) const { return ... // return some meaningful hash of the et elements } }; int main() { std::unordered_set<std::unordered_set<unsigned int>, MyHash> u; } 

You can see very good examples of hash functions in this answer .

You really have to provide both a hash function and equality that meets the standard requirements of an unordered associative container.

+1
source

The hash () function by default for creating hashes of your collection elements does not know how to treat the entire collection as an element. Create a hash function that creates a unique value for each unique set, and you're good to go.

This is the constructor for unordered_set

explicit unordered_set( size_type bucket_count = /*implementation-defined*/, const Hash& hash = Hash(), const KeyEqual& equal = KeyEqual(), const Allocator& alloc = Allocator() ); http://en.cppreference.com/w/cpp/container/unordered_set/unordered_set

Perhaps the easiest thing for you is to create a hash function for your unordered_set<unsigned int>

 unsigned int my_hash(std::unordered_set<unsigned int>& element) { for( e : element ) { some sort of math to create a unique hash for every unique set } } 

edit: as you can see from another answer that I completely forgot, the hash function must be inside the Hash object. At least according to the constructor that I inserted in my answer.

0
source

There is a reason the hash does not matter unordered_set . By default, unordered_set is a mutable sequence. The hash must have the same value as long as the object is in unordered_set . Thus, your elements should be immutable. This is not guaranteed with the const& modifier, since it only ensures that only the main unordered_set and its methods will not change sub unordered_set . Not using a link can be a safe solution (you still have to write a hash function), but do you really need the overhead of moving / copying unordered_set ?

Instead, you can use some kind of pointer. It is perfectly; a pointer is only a memory address, and your unordered_set itself does not move (it can redistribute its pool of elements, but who needs it?). Therefore, your pointer is constant, and it can hold the same hash throughout its life in unordered_set . (EDIT: as Howard pointed out, you must make sure that any element of your order is stored for your set, if two sets have the same elements, they are considered equal. Applying the order in how you store integers, you freely get two sets that correspond to two equal vectors.)

As a bonus, you can now use the smart pointer inside the most basic set to manage sub unordered_set memory if you allocated them on the heap.

Note that this is not yet the most efficient implementation to get a set of int sets. To make you a subset, you can write a quick wrapper around std::vector that stores an int ordered by value. int ints are small and cheap to compare, and using a dichotomous search is only O(log n) in complexity. A std::unordered_set is a heavy structure and what you lose by going from O(1) to O(log n) will get it back, having compact memory for each set. It should not be too difficult to implement, but it will almost certainly be better in performance.

It will be more difficult to implement the solution trie .

0
source

Source: https://habr.com/ru/post/1210312/


All Articles