Processing hash collisions using linear sensing

I read about the hash table and open source advertising. If you want to insert keys: 18,32,44 into a hash table of size 13:

18 gets index 5 (18 modulus 13 = 5) 32 gets index 6 (32 modulus 13 = 6) 44 gets index 5 (44 modulus 13 = 5) 

You will get a collision because there is already something in the index.

If you use linear research, you will do hashfunction = (key+i) modulus N where i = 0,1,2.. until you find an empty space in the hash table. Then 44 will be inserted at index 7.

What if you delete 32 and then want to delete 44. You start by looking at hashfunction(44)=5 - it was not 44, and then hashfunction(44 + 1) = 6 - empty. Then you might think that 44 is gone. How do you mark a place in the hash table that the place is not empty but does not contain a key, and that you should keep looking for 44 in the next index?

If you need to insert another key into index 6, then the key simply overwrites the โ€œlabelโ€ in the hash table.

What could you use to indicate an index - it was the key here, but it was deleted, so you keep looking at the next index? You cannot just write null or 0, because then you think that the key has been deleted (null) or that the key with a value of 0 has overwritten 44.

+4
source share
4 answers

One way to process hash tables using public addressing is to use status labels: EMPTY , OCCUPIED and DELETED . Please note that there is an important difference between EMPTY , which means that the item has never been used and DELETED , which means that it was used but was deleted.

When the value is deleted, the slot will be marked as DELETED , not EMPTY . When you try to get the value, you research until you find the slot that marks EMPTY ; for example: you think that the DELETED slots are the same as OCCUPIED . Note that insertion may ignore this difference - you can insert into a DELETED or EMPTY slot.

The question is related to the Java tag, which is a little misleading because Java (or at least the Oracle implementation) does not use open addressing. Open addressing becomes especially problematic when the load factor becomes high, which leads to hash collisions much more often:

enter image description here

As you can see, there is a sharp drop in performance around the 0.7 mark. Most hash tables change as soon as their load factor passes through a certain constant coefficient. For example, Java doubles the size of its HashMap when the load factor reaches 0.75.

+5
source

It looks like you are trying to implement your own hash table (as opposed to using the Hashtable or HashMap included in java), so this is more a matter of data structure than a Java issue.

Moreover, the implementation of a hash table with open addressing (for example, linear sensing) is not very effective when it comes to deleting elements. The usual solution is to "pull out" all the elements that are in the wrong slot, so there will be no sounding.

There is some pseudo code that describes this fairly well on wikipedia:

http://en.wikipedia.org/wiki/Open_addressing

+2
source

Hash table buckets are not limited to storing a single value. Therefore, if two hashes of objects in the same place in the table, both of them will be saved. A collision means that the search will be a bit slower, because when searching for a value using a key that hashes in a specific place, it will have to check each record to see if it matches

It looks like you are describing a hash table in which you store only one record and each index. The only way I can do this is to add a field to the structure retaining a value indicating whether this position has had a collision. Then, when you do a search, you should check the key, if it was a match, you have a value. If not, you should check if a collision has occurred, and then check the next position. When deleting, you will have to leave the collision marker, but delete the value and key.

0
source

If you use a hash table that uses this approach (which none of the built-in hash collections does), you need to go through all the last keys to see if they need to be moved (to avoid holes). Some may be for the same hash value, and some may be collisions for unrelated hash codes. If you do this, you will not have any holes left. For a hash map that is not too full, this should not create a lot of overhead.

0
source

Source: https://habr.com/ru/post/1382045/


All Articles