I am in the middle of implementing my own dht for the inner cluster. Since it will be used in a file sharing program such as bittorrent, "Mainline DHT" was the first thing I looked at. After that, I found "confusing" (python, dht using a twisted matrix), congress (python, dht using pyev + libev) and, of course, the original "kademlia".
They have different approaches to organizing k-buckets:
1) congress, kademlia uses fixed 160 buckets in the range 2 * i <= (the difference for each identifier is from us) 2 * (i + 1), with 0 <= i <160.
2) mainline DHT and involved dynamic buckets. At launch, they have only 1 bucket covering the entire space. After it is filled with 8 living nodes, the bucket will be divided into 2 new ones. But ONLY if our own identifier is inside this bucket. If this is not the case, the bucket will never be broken. So, soon we will have 160 buckets closest to us and several others.
Both options are good enough. But I found a HUGE difference in logic that detects some identifier of some bucket or not. And that is my question.
congress and kademlia consider buckets to be “minimum distance from us” and “maximum distance from us”. Thus, our own identifier will ALWAYS be in bucket0. The maximum of 2 other identifiers in bucket1 (because it covers 2 * 1 <= x <2 * 2 distances) will ALWAYS be closest to us. Therefore, my brain does not break, everything is fine.
But if you look at Mainline DHT or are confused, you will see which vegetative packages are considered as absolute node id, not xor distance! Thus, in theoretically complete table identifiers 0,1,2,3,4,5,6,7 will be in 1 bucket.
So. Why do some implementations interpret the boundaries of the bucket as "maximum / minimum distance from us" and others as "maximum / minimum value of 160 bits integer"?