Collection.mutable.OpenHashMap vs collection.mutable.HashMap

For put and get operations, OpenHashMap about 5 times superior to HashMap : https://gist.github.com/1423303

Are there cases where a HashMap preferable to an OpenHashMap ?

+6
source share
1 answer

Your code exactly matches one of the OpenHashMap use cases. Your code:

 println ("scala OpenHashMap: " + time (warmup) { val m = new scala.collection.mutable.OpenHashMap[Int,Int]; var i = 0; var start = System.currentTimeMillis(); while(i<100000) { m.put(i,i);i=i+1;}; }) 

Explanation of OpenHashMap ( scaladoc ):

Modified hash map based on open hash scheme. The exact scheme is undefined, but it must make reasonable efforts to ensure that the insert with successive hash codes is not canceled. In particular, sequential integer key mappings should work without significant performance loss .

My emphasis. This explains your findings. When to use OpenHashMap and not HashMap? See Wikipedia . From there:

Target hash tables with linked lists are popular because they only require basic data structures with simple algorithms and can use simple hash functions unsuitable for other methods.

The cost of the table operation is the scanning of records of the selected bucket for the desired key. If the distribution of keys is fairly uniform, the average search cost depends only on the average number of keys per bucket, that is, the load factor.

Hash table chains remain effective even if the number of record tables n is much larger than the number of slots. Their performance deteriorates more gracefully (linearly) with a load factor. For example, a chain hash table with 1000 slots and 10,000 stored keys (load factor 10) is five to ten times slower than a table with 10,000 slots (load factor 1); but still 1,000 times faster than a simple sequential list, and perhaps even faster than a balanced search tree.

For a separate connection, the worst case scenario is when all the records have been inserted into the same bucket, in which case the hash table is ineffective, and the cost is to search the data in the bucket composition. If the latter is a linear list, the search procedure may have to scan all of its entries; therefore, the worst value is proportional to the number of n entries in the table.

This is a general explanation. As always with these things, your performance will vary depending on the use case; if you are interested, you need to measure it.

+5
source

Source: https://habr.com/ru/post/903286/


All Articles