Hashmap loadfactor - depending on the number of buckets used or the number of records in all buckets?

I tried to understand that the hashmap is intercepted when the number of occupied buckets or the total number of records in all buckets is exceeded. So, we know that if 12 of 16 (one record in each bucket) buckets are full (considering the standard loadfactor and initial capacity), then we know that the hashmap will be rephrased in the next record. But what about this case, when it is assumed that only 3 buckets are occupied by 4 inputs each (a total of 12 entries, but only 3 buckets of 16 in use)?

So, I tried to replicate this by making the worst hash function that put all the records in one bucket.

Here is my code.

class X { public Integer value; public X(Integer value) { super(); this.value = value; } @Override public int hashCode() { return 1; } @Override public boolean equals(Object obj) { X a = (X) obj; if(this.value.equals(a.value)) { return true; } return false; } } 

Now I started putting values ​​in hashmap.

 HashMap<X, Integer> map = new HashMap<>(); map.put(new X(1), 1); map.put(new X(2), 2); map.put(new X(3), 3); map.put(new X(4), 4); map.put(new X(5), 5); map.put(new X(6), 6); map.put(new X(7), 7); map.put(new X(8), 8); map.put(new X(9), 9); map.put(new X(10), 10); map.put(new X(11), 11); map.put(new X(12), 12); map.put(new X(13), 13); System.out.println(map.size()); 

All nodes entered one bucket, as expected, but I noticed that on the 9th record the hash map repeated and doubled its capacity. Now on the 10th record, he again doubled his capabilities.

With 8 entries With 9 articles

Can someone explain how this happens?

Thanks in advance.

+5
source share
2 answers

In HashMap, entries will be present in the same bucket if their hash codes are the same. If unique Integer objects are placed inside the hash map, their hash code will definitely change because they are different objects.

But in your case, all objects have the same hash code. which means that you indicated that all records will be in one bucket. Each of these buckets corresponds to a specific data structure (linkedList / tree). Here, the throughput varies according to the specific data structure and hashmap.

I have a JB Nizet code ( https://gist.github.com/jnizet/34ca08ba0314c8e857ea9a161c175f13/revisions ) mentioned in a comment with cycle limitations of 64 and 128 (adding 64 and 128 elements):

  • When adding 64 elements : the capacity doubles (128) when adding the 49th element to the HashMap (since the load factor is 0.75)
  • When adding 128 elements : the capacity doubles (256) when adding the 97th element to the HashMap (also because the load factor is 0.75).

After increasing the capacity to 64, HashMap works fine.

As a result, the bucket uses a linked list with a certain length (8 elements). After that, he uses the tree data structure (where there are fluctuations in capacity). The reason is that accessing the tree structure (O (log (n))) is faster than the linked list (O (n)).

enter image description here

This figure shows an internal JAVA 8 HashMap array with both trees (in bucket 0) and linked lists (in bucket 1,2 and 3). Bucket 0 is a tree because it contains more than 8 nodes ( readmore ).

Hashmap documentation and discussion of the bucket in hashmap will be useful in this regard.

+1
source

Responding to comments is more than the question itself, as your comments are more relevant in what you really want to know.

The best and most relevant answer where this rehashing on bucket size is explained further is the source code. What you observe in the 9-th record is expected and will happen in this part of the code:

 for (int binCount = 0; ; ++binCount) { // some irrelevant lines skipped if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } 

where TREEIFY_THRESHOLD = 8 and binCount is the number of boxes.

This treeifyBin method treeifyBin bit misleading, as it can change the size, not the treefiy bin, which is the corresponding part of the code from this method:

 if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); 

Note that this will actually resize (read twice in size), and not make Tree until MIN_TREEIFY_CAPACITY (64) is reached.

+1
source

Source: https://habr.com/ru/post/1274391/


All Articles