As noted in the comments, the basic concept is to reduce the number of queries to the database by eliminating query texts that cannot be in the database.
As you think, hashing is a great way to do this - if you use a good hash function (i.e. with a low chance of collision), you will never need to actually compare the texts, since you ( almost certainly ) will never get false positives.
A hashing option (if memory becomes a problem) is to use the Bloom Filter : "a spatially efficient probabilistic data structure that is used to check whether an element is a member of a collection." This approach uses much less memory than simple hashing, due to some false positives (i.e., sometimes you need to compare text with a database to make sure you have a match). Bloom filters allow you to configure tradoff between memory and a false positive rate. They are used in NoSQL databases such as BigTable and Apache Cassandra .
Refresh , as David Schwartz rightly points out, the hash function matters, especially in the case when the significant correctness of the requests is contained in the database - see our discussion in the comments below. The same problem applies to Bloom filters, which are ideal if you want to restrict disk access to a small subset of files (for example), but not to completely remove disk / database access due to false positives. I originally mentioned MurmurHash3 with a speed test, but since it is not cryptographic, it is not suitable for this question. When I really implemented such approaches in the past, I used SHA1, which is cryptographic.
source share