Since you know all the lines ahead of time, you can use gperf to create a perfect hash function that has no collisions. For example, with four AAAA ABBA ACEA ALFG
input lines, it generated the following hash function (using the gperf -L ANSI-C input.txt
command line):
static unsigned int hash (register const char *str, register unsigned int len) { static unsigned char asso_values[] = { 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 7, 2, 5, 12, 12, 12, 12, 12, 12, 12, 12, 0, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12 }; return len + asso_values[(unsigned char)str[1]]; } const char * in_word_set (register const char *str, register unsigned int len) { static const char * wordlist[] = { "", "", "", "", "ALFG", "", "ABBA", "", "", "ACEA", "", "AAAA" }; if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH) { register int key = hash (str, len); if (key <= MAX_HASH_VALUE && key >= 0) { register const char *s = wordlist[key]; if (*str == *s && !strcmp (str + 1, s + 1)) return s; } } return 0; }
This requires a single table scan, length comparison and row comparison. If you know for sure that the word you have hashed is one of your original words, you can skip string comparisons.
Expanding the input size from 4 to 10,000 randomly generated strings increases the hash function by only 4 lookup tables plus length comparisons and string comparisons. But, since string comparison should store every source line in it, it is output to a very large table in a compiled object file (1.4 MB). If you do not need to perform string comparisons, you can omit this table.