Closed HashMap in Java / Scala

Very often, the performance bottleneck when using hash maps is the equals method. equals can be very expensive for deep data structures.

Please note that the following is about immutable hash maps. That way, at least you will never delete the key. I think adding keys should be fine.

Unsafe get

Suppose you request a hash map, being sure that it contains the requested key. Then, if there is no collision for a given key, the found single record can only be returned based on the hash hit, since it must be the requested object.

In most cases, you can avoid calling equals in get (in the absence of a collision).

Questions

  • What is the name of this concept?
  • Are there any hash map implementations for Java or Scala that support such an unsafe get operation?

Btw, I am open to suggestions on a better topic.

+4
source share
4 answers

What is the name of this concept?

Identity card. It will not call equals() when searching for items and just use an identifier (i.e. == ).

The correct solution is not to “correct” the map, but to use only the keys that use Object.equals() by default:

 public boolean equals( Object other ) { return this == other; } 

The problem is that finding elements on this map can be problematic if all the keys are not single. Therefore, you cannot use String , for example, because Java does not guarantee that all line instances are interned. The same is true for Integer : instances of <-128 and> 127 will be different.

But if you use your own optimized implementation for keys, you can solve this problem.

+7
source

Such a data structure does not exist, as far as I know, precisely because it is unsafe - there is no way to reliably encode your specific condition.

But alas, the Scala collection library allows you to quickly implement rich new data structures with a bit of new code. Here is the implementation of your request:

 class ParticularHashMap[A, +B] private(buckets: Vector[List[(A, B)]]) extends Map[A, B]{ def this() = this(Vector.fill(ParticularHashMap.BucketsNo)(List.empty)) def get(key: A) = { val bucket = buckets(bucketIndex(key)) bucket match { case List((_, v)) => Some(v) //YOUR SPECIAL OPTIMIZATION ! case list => list.find(key == _._1).map(_._2) } } def iterator = buckets.flatten.iterator def -(key: A) = mkWithUpdatedBucket(key, _ filterNot (_._1 == key)) def +[B1 >: B](kv: (A, B1)) = mkWithUpdatedBucket(kv._1, insert(kv._1, kv._2.asInstanceOf[B], _)) //if you're wondering why it Bucket[A, Any] and not Bucket[A, B] it because of the covariance of B private def mkWithUpdatedBucket(key: A, f: Bucket[A, Any] => Bucket[A, Any]) = { val index = bucketIndex(key) val newBucket = f(buckets(index)) (new ParticularHashMap[A, Any](buckets.updated(index, newBucket))).asInstanceOf[ParticularHashMap[A, B]] } private def insert(k: A, v: Any, bucket: List[(A, Any)]): Bucket[A, Any] = bucket match { case List() => List((k, v)) case (k1, v1) :: kvs if k == k1 => (k, v) :: kvs case (k1, v1) :: kvs => (k1, v1) :: insert(k, v, kvs) } private def bucketIndex(key: A) = Math.abs(key.hashCode()) % ParticularHashMap.BucketsNo } object ParticularHashMap { private val BucketsNo = 256 private type Bucket[+A, +B] = List[(A, B)] def apply[A, B](kvs: (A, B)*) = new ParticularHashMap[A, B] ++ kvs } 

Please note that this is a naive implementation , but you can improve it as well as your needs.

It will work as expected:

 val myMap = ParticularHashMap("hello" -> 2, "world" -> 4) println(myMap.get("hello")) >> Some(2) println(myMap.get("world")) >> Some(4) println(myMap.get("world1")) >> None 

But of course, be careful: if you do not respect your contract, bad things will happen. The following is an example demonstrating a flaw:

 val evilMap = ParticularHashMap(1 -> 2) println(evilMap.get(1)) >> Some(2) println(evilMap.get(257)) >> Some(2) //BUT 257 IS NOT IN THE MAP !!! 
+4
source

1) What is the name of this concept?

AFAIK, he has no name. People usually give names only to those solutions that work. A solution that does not work (or only works in very limited situations) usually does not deserve a name.

2) Are there any hash map implementations for Java or Scala that support such an unsafe get operation?

Not that I knew. And again, people tend not to write and publish libraries that work only in limited situations or that are problematic in their behavior.


Where is the lack of it? Why doesn't it work?

Firstly, the Map.get(...) method, which returned the wrong value if you gave it a key that was not recognized, fundamentally violates the Java Map.get contract. And there may be other violated methods; e.g. Map.containsKey , Map.put , Map.putAll and / or Map.remove .

Secondly, a data structure that gives the wrong answer if your assumptions are wrong is a bad idea ... if you can avoid it.

Thirdly, modulo that we do not know your actual precedent, there are almost certainly better ways to solve the problem. The obvious ones are:

  • change the equals key class to make it less expensive
  • if you cannot change the equals , define and use the proxy key class or
  • use Map<Integer, List<Pair<Key,ValueClass>>> , where the integers used as keys are hash codes for real keys.

(The latter approach requires that the dubious “always contains a key” assumption, but at least you do not violate the abstraction of the map, and therefore you can perform proper checks if necessary.)

Finally, even if you decide that your "dirty" optimization is necessary for your application and decide to implement it yourself. I maintain that my answers are still the correct explanations:

  • why there is no universally accepted term for this "hacking" and
  • why there is no built-in implementation in the 3 main libraries of Java libraries (or any other that I know about at the moment).

You may not like it, but there is no better explanation.

0
source

In this case, the only thing you need is an array, right?

If you know that there is no collision - this means that the two keys do not have the same hash code - you can simply use the hash code as the address to index the array. For example, if one of your hash codes for the key object is 1003, then the corresponding value object can be placed in your array [1003].

If you cannot afford huge arrays, given that you can perform the "unsafe" operation, you can calculate the address as key.hashcode ()% yourArray.length. Whenever key1 and key2 are mapped to the same slot due to the modulo operation, just overwrite the slot.

 class SimpleMap<K, V> { private V [] data; public SimpleMap(int size){ data = (V[])new Object[size]; } public void put (K key, V value){ data[key.hashCode() % data.length] = value; } public V get(K key) { return data[key.hashCode() % data.length]; } } 
0
source

Source: https://habr.com/ru/post/1446916/


All Articles