Hash function for three-dimensional integer coordinates

Having a three-dimensional homogeneous grid to save memory in large models, you do not need to save empty cells (those that do not intersect with any object). For this, I use a dictionary in C #. Although performance has already decreased, it’s better than the exception when creating a 3D mesh. Now my task is to find a quick hash function that compares the three-dimensional grid coordinate with a unique number.

I already tried ((x * 73856093 + y * 19349669 + z * 83492791))% n, which does not always generate a unique number.

+6
source share
2 answers

On the one hand, you write your goal as "save memory", and on the other hand, you request a "quick hash function that maps a three-dimensional integer grid coordinate to a unique number." These two are not very compatible.

Or you want to guarantee access O (1). In this case, you need to prevent hash collisions and map the input to unique numbers. But in this case, you also need as many cells on your hash map as there are possible inputs. Thus, you will not get memory savings for a simple N Γ— N Γ— N array.

Or - and this is much more likely - you want hash collisions to be rare. Then you can have a hash map that is approximately twice the number of actually stored objects. But in this case, you do not need to completely avoid collisions with the hash, you need to do them quite rarely.

Choosing a good hash function depends on the likely patterns of your input. If the input is pretty random and knows the size of your hash map, you should strive for even distribution. If objects are more likely to be located in neighboring blocks, then you want to make sure that small changes in coordinates are unlikely to lead to a collision. This is the moment when it helps not to turn your odds into prime numbers, so a small change in one direction is less likely to collide with one in the other direction.

If in doubt, you can always check things out: given three primes (for example, for the hash 137x + 149y + 163z) and some real settings (i.e. the coordinates used and the final size of the hash map), you can simply apply the hash to all coordinates, mod to the size of the hash map and counting the number of unique values. Do the same for the different triples and choose the one that maximizes this number. But I doubt that the level of optimization is really worth the effort.

+3
source

Instead of trying to write a new article in an already well-lit section, see the wikipedia article on hash functions. In particular, the first image clearly shows how several inputs are hashed to a single value.

Basically, your triplet is hashed to a certain hash value in the range [0.2 ^ 64 - 1] (duplicates are allowed!). Then the range is reduced to some slightly larger than your number of input values ​​(e.g. n = 5) through the equation hash = hash% n. The resulting ratio for input values, for example, [(1,1,1), (1,2,3), (2321, 322, 232), (3,3,3)], will look something like this:

(1,1,1) -> 2 (1,2,3) -> 0 (2321, 322, 232) -> 0 (3,3,3) -> 3 

As you can see, no input value is associated (e.g. hashes) with 1 or 4, but there are two input hash values ​​up to 0.

The strength of the hash (and the reason the average case is O (1)) becomes clear, noting that in order to get the input value from the hash table (for example, (1,1,1)), the following steps are performed.

  • The hash x of the input value is calculated and hash = hash % n is applied, therefore (1,1,1) β†’ 2.
  • A direct search is performed O (1), i.e. hash_function[2] = (1,1,1) + additional data stored with this particular input value .
  • Done!

In the case when more than one input value is matched with the same hash function value (0 in our example), the internal algorithm should search for those input values ​​that are often performed using the red-black tree (worst case O(log n) ). Thus, the worst case for any search is also O(log n) .

An ideal hash occurs when a relation becomes one-to-one in function (bijection). This gives better performance, but rarely. As I said earlier, fortunately, it's easy to create an almost perfect hash where duplicates are not enough. Essentially make your hash function as random as possible.

The examples I gave in the comments may be adequate (and the wrong way to do it) :), but a more standard caculation would be: hash = ((((prime1 + value1) * prime2) + value2) * prime3) + value3) * prime4

which also answers the question. Note that primes can be any primary, but small values ​​are usually used, such as 31.37, etc.

In practice, testing can be used to test performance, but is usually not required.

In any case, re-reading your question, I wonder why you are not discarding the whole idea of ​​the hash, and not just storing your points in a simple array?

+2
source

Source: https://habr.com/ru/post/974759/


All Articles