Perfect Hash Function for URLs

Does anyone know of an ideal hash function for URLs with 64-bit integers that would work well for most URLs?

+3
source share
2 answers

Found it as "Base52 url shortener perfect hash function in C" from http://lambdajones.com/b52

  const char *b52idx[52] = { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "B", "C", "D", "F", "G", "H", "J", "K", "L", "M", "N", "P", "Q", "R", "S", "T", "V", "W", "X", "Y", "Z", "b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z" }; #define X 0xff const int b52map[128] = { X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, // 0 1 2 3 4 5 6 7 8 9 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, X, X, X, X, X, X, // BCDFGHJKLMN X, X,10,11,12, X,13,14,15, X,16,17,18,19,20, X, // PQRSTVWXYZ 21,22,23,24,25, X,26,27,28,29,30, X, X, X, X, X, // bcdfghjklmn X, X,31,32,33, X,34,35,36, X,37,38,39,40,41, X, // pqrstvwxyz 42,43,44,45,46, X,47,48,49,50,51, X, X, X, X, X }; #ifdef __GNUC__ #define likely(x) __builtin_expect((x),1) #else #define likely(x) (x) #endif /* valid from 00000 -> zzzzz, good for 380204032 urls returns the integral short url id */ unsigned long long b52(const char *c) { unsigned long long x = 0; unsigned long long y = 0; unsigned long long z = 0; x |= b52map[c[0]] << 24 | b52map[c[1]] << 18 | \ b52map[c[2]] << 12 | b52map[c[3]] << 6 | b52map[c[4]]; y += (x/64) * 12; if (x > 4095) y += 624 * (x/4096); if (x > 262143) y += 32448 * (x/262144); if (x > 16777215) y += 1687296 * (x/16777215); if (likely((z = x - y) < 380204033)) return z; else return 380204033; } void b52inc(char *id) { int x[5] = { b52map[id[0]], b52map[id[1]], b52map[id[2]],b52map[id[3]], b52map[id[4]] }; int y = 5; // search for the first character we can increment (51 == 'z') // increment from the b52idx table and update id while (y--) if (x[y] < 51) break; id[y] = *b52idx[++x[y]]; // if we passed over id 'z above, roll them over while (y++ < 5) if (x[y] == 51) id[y] = '0'; } 
+2
source

It is impossible to create a perfect hash function if you do not know the set of keys that you request in advance. If you know this, then you can use something like gperf or cmph to create the perfect hash function for you.

http://cmph.sourceforge.net/

I assume that the ideal hash function is not really what you want, so you just have to use any reasonable hash function, such as the hash noise or the bob jenkins hash, with a hash table implementation like __gnu_cxx :: hash_map or dense_hash_map and sparse_hash_map from google.

http://code.google.com/p/google-sparsehash/ http://sites.google.com/site/murmurhash/ http://burtleburtle.net/bob/hash/doobs.html

+3
source

Source: https://habr.com/ru/post/1380669/


All Articles