Memcache Key Generation Strategy

Given a function f1 that takes n String arguments, what would be better in terms of runtime performance of the random key generation strategy for memcache?

Our Memcache client performs internal md5sum hashing on the keys it receives:

public class MemcacheClient { public Object get(String key) { String md5 = Md5sum.md5(key) // Talk to memcached to get the Serialization... return memcached(md5); } } 

My use cases:

First option

  public static String f1(String s1, String s2, String s3, String s4) { String key = s1 + s2 + s3 + s4; return get(key); } 

Second option

  /** * Calculate hash from Strings * * @param objects vararg list of String's * * @return calculated md5sum hash */ public static String stringHash(Object... strings) { if(strings == null) throw new NullPointerException("D'oh! Can't calculate hash for null"); MD5 md5sum = new MD5(); // if(prevHash != null) // md5sum.Update(prevHash); for(int i = 0; i < strings.length; i++) { if(strings[i] != null) { md5sum.Update("_"); md5sum.Update(strings[i].toString()); // Convert to String... md5sum.Update("_"); } else { // If object is null, allow minimum entropy by hashing it position md5sum.Update("_"); md5sum.Update(i); md5sum.Update("_"); } } return md5sum.asHex(); } public static String f1(String s1, String s2, String s3, String s4) { String key = stringHash(s1, s2, s3, s4); return get(key); } 

Please note that a possible problem with the second option is that we are doing the second md5sum (in the memcache client) based on the result of the md5sum digest already received.

Thanks for reading, Maxim.

- Change Used MD5 Service Source

+2
source share
2 answers

"Better" in what sense? Why do you think the second option is "better"? It does more string concatenations, more MD5 hashes and usually looks much less efficient than the first ...

+1
source

Just nitpicking, but you probably don't want random key generation, key generation should be deterministic, but it should generate an even distribution in the key space.

If you consider only random collisions, then the first approach is almost perfect. You must prefix the strings with their length so that you do not get collisions when the substring moves from one parameter to another. Given md5, the avalanche features are pretty good, which ensures that random collisions are rare enough to be ignored.

But be careful with MD5, if you handle user input, it knows collision attacks. If an untrusted user can choose arbitrary bytes for function parameters, and returning an incorrect result can have security implications, then you have a security hole. For example, if you use this information to cache authorization, an attacker could work out two sets of parameters, the hash of which has the same value. One could gain access to something public, and to another - to a protected service. Now just request authorization with the first set, get authorization, and then get access to the secure service with another set, receiving a green light from cached authorization.

+1
source

Source: https://habr.com/ru/post/1245609/


All Articles