The answers here are a lot of dubious information. You tagged your question with cryptography and only mention cryptographic hash functions, but it looks like you don't need cryptographic protection, in particular because you say:
I have over 1 billion entries that I need for a hash and check to see if they are unique.
There are four properties of a cryptographic hash function :
- easy to calculate hash value for any given message
- cannot create message with given hash
- cannot change message without changing hash
- It is not possible to find two different messages with the same hash.
You are really only interested in the first quality, and uniqueness is a requirement of a smaller scale, which is partially associated with three other properties of cryptographic security.
Why does it bother you?
There is overhead in cryptographic security. You don’t need it, and you are interested in speed, so why not miss it? The width of the MD5 hash and the SHA family are admittedly large enough for your purposes.
Check out the list of hash functions on Wikipedia or check out the article on normal hash functions . Moreover, what happened to the built-in .NET hashing functions? Have you tried simply deferring to the Object.GetHashCode() method? This MSDN link has a lot to say about the use of hash functions. You do not say much about the data that you hash, so it’s hard to say whether the result will be unique between your objects or not. How do you load an object into an MD5 hash? I assume you are taking a binary representation. A similar approach can be used to use the built-in non-critical hash function.
You may be concerned about the uniqueness of the built-in hash functions. They only return a regular int, which is 2 ^ 32, only 4 times larger than the dataset you are working with. However, you always need to have a backup plan for hash functions. Collisions are unacceptable, not impossible. The standard reserve is to make a more expensive comparison, usually a link comparison and a field comparison.
If you are not ready to accurately compare your hash outputs, you basically count until you get a false result. This may not be a big problem for you: only you can judge what is there.
In addition, performing another calculation of the hash function is probably not much faster than a direct comparison. You are better off on all counts with a confident thing and make a long, direct comparison.
Another common collision avoidance method is to use multiple keys. Therefore, if your data points have several large subcomponents, you haveh and compare them yourself. If it has some large and some small components (say, some simple numeric types), you make the hash large and make a direct comparison with the small ones. If they have some data that is easy to take a serial number (for example, the length of the lines or the size of some containers), you can perform a direct comparison of these bits.
If this does not work for you, take a look at the implementation of the other hash functions listed on the wiki. Here's a pretty good reference for MurmerHash3 , which can calculate 32-bit or 128-bit hash values. There are other hash functions in the list with long hash widths as well as available C # libraries. But, as this link points out, Murmurhash is faster than the MD5 and SHA functions, although it does not directly compare with the Object.GetHashCode method mentioned above.