Hash function for string

I am looking for a hash table for a set of short-length strings (each row is less than 50) with a special function, which whenever we look for a row in this table, if the row is inside the table it returns a related object of this row or a specific and unique number , and if the row is not inside the table, it gives the identifier of a very similar row for input. to determine the similarity between two lines, we can define another function, but suppose we define it as the minimum number of operations that need to be converted from one line to another. three notes:

  • the length of each query string and the stored string are always similar and fixed.
  • The string alphabet is limited to only 5 different characters.
  • For me, memory and speed are important.

I am not looking for a final solution, but any suggestion or introduction of some documents with a similar approach in a similar state is welcome.

+4
source share
1 answer

If a lot of strings are of course known, they use GPERF to create the perfect hash function.

Follow this with a SOUNDEX hash (string) that uses a linked list to resolve conflicts.

Use the first set for an exact search. Use the second set for close matches.

EDIT: since your alphabet is only 5 characters, you have the option of creating multiple hash tables to reduce the size of your N.

, . AAAAA, A. AABBBB, AB ABCDEF. 5 . 120 - -, .

, , . , . ? , , ?

+1

Source: https://habr.com/ru/post/1570772/


All Articles