Ideal hash for known values

Let's say I have some known values ​​against which I want to create a hash table. For instance,

For 0x78409 -> 1 For 0x89934 -> 2 For 0x89834 -> 3 

etc...

But these values ​​(0x78409, 0x89934, 0x89834) are known only at runtime, so you cannot use the switch / case. However, they become known at the beginning of execution, so perhaps we can create a hash function that adapts to create the perfect hash table. So my question is: can we create an ideal hash function for such a case.

+4
source share
3 answers

If the entire input domain is known before the hash map is created, then this is possible, but some form of runtime code generation is required either through the virtual machine or through the JIT (possibly through a scripting language such as LuaJIT), which allows you to use gperf and its ilk to create a hash at runtime, compile it, and then use it to populate and extract from the map.

A simpler, more viable solution is to use a hash function with extremely low collisions for a given set of input permutations (for example, you can use only alphabetic characters, for example, lowercase letters), a minimal perfect hash.

Murmur3 and crapwow are the ones to look for (although I would be careful with crapwow), Google CityHash and xxHash are also worth a look. Bob Jenkins also has a good minimal perfect hash map available here , which should also be perfect.

+3
source

Wikipedia gives this page . But are you sure you want the perfect hash function? Perhaps a good enough and fast hash function?

0
source

Do you have enough of these properties ?:

  • no collisions
  • Worst key search performance => O(log n)

then the implementation of the evidence-based concept:

 #include <stdio.h> #include <stdlib.h> #include <string.h> #define MAX_PAIRS 3 #define KEY_SIZE 20 #define VALUE_OR_KEY_NOT_FOUND(v) ((v==NULL)? "(__KEY_NOT_FOUND__)":v) typedef struct { char Key[KEY_SIZE]; char Val[KEY_SIZE]; } Pair; typedef struct { unsigned int lastPairIndex; Pair pairs[MAX_PAIRS]; } HashTable; int comparePairs(const void * a, const void * b) { return strcmp(((Pair *)a)->Key, ((Pair *)b)->Key); } void hashInsert(HashTable * hashTable, char * key, char * val) { unsigned int ix = hashTable->lastPairIndex++; strcpy(hashTable->pairs[ix].Key,key); strcpy(hashTable->pairs[ix].Val,val); } void hashFinalize(HashTable * hashTable) { qsort(hashTable->pairs, MAX_PAIRS, sizeof(Pair), comparePairs); } char * hashLookup(HashTable * hashTable, char * key) { char * r = bsearch(key, hashTable->pairs, MAX_PAIRS, sizeof(Pair), comparePairs); if (r != NULL) return r + KEY_SIZE; return NULL; } int main() { HashTable ht = {0}; char * res; char * key_1 = "jkl"; char * key_2 = "oops"; hashInsert(&ht, "jkl", "some val"); hashInsert(&ht, "def", "other val"); hashInsert(&ht, "abc", "this val"); hashFinalize(&ht); res = hashLookup(&ht, key_1); printf("\"%s\" key => value: \"%s\"\n", key_1, VALUE_OR_KEY_NOT_FOUND(res)); res = hashLookup(&ht, key_2); printf("\"%s\" key => value: \"%s\"\n", key_2, VALUE_OR_KEY_NOT_FOUND(res)); return 0; } 

Many errors are not checked - for example, when pasting - the same key already exists. This is just a proof of concept. O(log n) key search performance is achieved because the keys are sorted alphabetically, and thus, key searches can be performed using a binary search algorithm. NTN!

0
source

Source: https://habr.com/ru/post/1381248/


All Articles