Does a HashSet do the sorting job internally?

My Set sometimes sorted, and sometimes not.

Here is an example:

 public class SetOfInteger { public static void main(String[] args) { Random rand = new Random(47); Set<Integer> intset = new HashSet<>(); for (int i = 0; i < 10; i++) { int j = rand.nextInt(30); System.out.print(j + " "); intset.add(j); } System.out.println(); System.out.println(intset); } } 

The result shows that Set not sorted.

 8 5 13 11 1 29 28 20 12 7 [1, 20, 5, 7, 8, 11, 12, 29, 28, 13] 

When I change the termination expression to i < 20 in for a statement, the result shows that Set become sorted.

 8 5 13 11 1 29 28 20 12 7 18 18 21 19 29 28 28 1 20 28 [1, 5, 7, 8, 11, 12, 13, 19, 18, 21, 20, 29, 28] 

This is so weird, isn't it? I just don’t know how to explain this, and I need help, thanks a lot.

+5
source share
6 answers

A HashSet does not guarantee sorting of iterations, but under very specific circumstances, its internal data structure can act like sorting a bucket .

In particular, for integer keys in the range [0.65535] and a table size that is larger than the largest key, the index of the bucket key that is stored is equal to the key itself, and since the iterator iterates in bucket order, it emits elements in sorted okay.

+13
source

There are good answers in everything, but no one is trying to explain what exactly is happening in this particular situation, so I will limit my answer to this, and not add another explanation of how the HashSet works. I accept this understanding as satisfied.

The default constructor for HashSet creates a set with a capacity of 16 and a load factor of 0.75. This means that there are 16 bins, and this capacity increases when you insert 16 * 0.75 = 12 unique elements.

So, in the first case, the numbers are sorted by their remainder, when divided by 16: the set starts with the size of table 16, "hashing" each element in the cell, taking x % 16 . Then, when there were 12 elements, he enlarged the table and performed a revision (see Javier Martin, if this is not clear), probably increasing the table to 32. (I could find information about how it grows in java 6 doc , in which it says that the number of buckets is "approximately" doubled, no matter what it means.) This gave each integer up to 30 of its own bins, so when iterating through each cell in order, it repeated in numbers in order. If you inserted numbers below 64, you will probably find that you need to insert 32 * 0.75 = 24 elements before the iteration is sorted.

Also note that this method of assigning bins is not guaranteed. HashSets in other versions / implementations of Java can do something more complex with hashCode() object values ​​than just take the rest. (As ruach and fluffy remarked in the comments - thanks!)

+6
source

Your question indicates that the order of things is changing as the set grows. However, you cannot count on saving your order. A Set has one guarantee: there is only one element of each type. There are other Set objects that provide additional guarantees, but a simple HashSet does not guarantee a guarantee.

The reordering you see is just an internal rearrangement because the HashSet is stored internally. In a very simplified way of thinking, a HashSet has a certain number of “slots” for storing values, which is usually an odd number if it is not also simple. Hash codes from getHashCode() are used to assign an object to a slot. When you have a hash clash, the HashSet uses the equals() operator to determine if objects are really unique.

When you add items to a HashSet , several things happen:

  • Objects are assigned their internal slot
    • Then hashcode is then hashed to find which slot it belongs to.
    • If there is a slot collision, then we check for equality. If this is the same object, we will drop it, if not add it to the list in this slot
  • When the number of objects exceeds the number of slots, the HashSet needs to be resized
    • It creates a larger set of slots, which is still usually an odd or prime number.
    • Existing items are reassigned to the new collection of slots - here the order may change

The bottom line is that if objects are magically sorted by themselves, this is not an implementation that you can count on if you are not using a TreeSet that imposes a sort order on the installed elements.

+5
source

Interest Ask. Set uses an array of linked list to store its elements. hashCode() used to find the position (indirectly) of the object that will be stored in Set .

If there are two objects that need to be stored in the same position, then the object is saved in the next slot of the linked list at this position.

An array size is a dynamic and estimated runtime depending on the number of objects in it. He's not sure, but I assume that you see your numbers as sorted, because the set may have increased size. The value of hashCode() depends on the value of the number and therefore would be calculated sequentially. Because the size of the underlying array would increase with increasing loop size. There would be no collisions and the results are sorted.

But still I would like to emphasize that my answer does not lead to any misconceptions. HashSet does not guarantee element ordering

+3
source

The iteration order for the HashSet is not defined, the only guarantee is that it is consistent: iteration over the HashSet that has not been changed will lead to the same sequences.

Internally, as the commentator said, the class uses the hashCode method of each element to store them in a certain number of boxes. So, for example, if it uses 20 bins, then it can take o.hashCode() % 20 as the bin index. Each bit can have several elements in the list, which are then distinguished by the equals method. Thus, even if the hash of the integer is its int value, the order does not have to be a natural integer order.

In addition, the set controls its load factor when inserting and deleting elements; given the proportion of free bins, the maximum size of the bin list, the average number of items per basket, whatever. When he considers it necessary, he performs a rethinking, which means changing the number of boxes that are used to store items, so their bin index changes because the value of n in o.hashCode() % n changes. Each element is "shuffled" to its new place (this is an expensive operation), thereby explaining the order that you see after adding additional elements.

+3
source

You must sort it manually because there is no guarantee that the hash will be sorted. If you want, you can also use TreeSet, which will provide you with the necessary functions, but if you want to use a HashSet, try this:

 Set intset = new HashSet(); List sortedIntList = new ArrayList(intset); Collections.sort(sortedIntList); 
+1
source

Source: https://habr.com/ru/post/1245821/


All Articles