Requirement that
- the order of the elements does not matter
makes me immediately think of something like Zobrist hashing . That is, you will have a function f matching integers with random bistro strings, and your hash will be just XOR bit strings corresponding to the numbers in your sequence.
Of course, the basic Zobrist hashing described above does not satisfy your other requirements, which
- multiple occurrences of elements are allowed
since the operation XOR is proper inverse (i.e., a XOR a = 0 for any a ). However, simply replacing XOR with some other ring operation without this property (which is really considered desirable in normal Zobrist hashing), for example n-bit add, should generate the hash as you want:
unsigned int hash_multiset (int *seq, int n) { unsigned int h = 0; while (n--) h += f( *seq++ ); return h; }
(A small detail to mark this function is that if you want to truncate its output, then it is slightly better to use the upper than the lower bit. This is because if the k least significant bits of the hashes of the sequences [a] and [b] collide, then the least significant bits k [a, a] , [b, b] , [a, b] , etc. For k high bits this is not true, since the lower bits can transfer to the higher bits, creating a more "random" output.)
There are various ways to implement the function f . For a limited range of input integers, you can simply use a fixed random bit string lookup table. Alternatively, if you do not know the range of your inputs in advance, you can use a different (regular) hash table to match integers with random bitrates and simply create it on the fly.
Finally, it is also possible to implement f without a lookup table, simply using a fixed function that "looks random enough." One good option for such a feature would be to use a simple and quick block encipher , such as TEA or (in systems with hardware support) AES , with the result truncated to your preferred length.