If you intend to repeatedly call the distance function during one run of your program, you can get some speed using a pre-computed bit table. Here is (another) version of the Hamming distance function:
Next, a and b are 32 Python lists listed in the comment to the question. divakar_hamming_distance() and divakar_hamming_distance_v2() are from @Divakar's answer.
Following are the timings of @Divakar's features:
In [116]: %timeit divakar_hamming_distance(a, b) The slowest run took 5.57 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 11.3 µs per loop In [117]: %timeit divakar_hamming_distance_v2(a, b) The slowest run took 5.35 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 10.3 µs per loop
hamming_distance1(a, b) little faster:
In [118]: %timeit hamming_distance1(a, b) The slowest run took 6.04 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 7.42 µs per loop
On my computer, initializing _nbits takes about 11 microseconds, so it makes no sense to use hamming_distance1 if you call the function only once. If you call it three or more times, there is a net profit in productivity.
If the inputs already have numerical arrays, all functions are much faster:
In [119]: aa = np.array(a) In [120]: bb = np.array(b) In [121]: %timeit divakar_hamming_distance_v2(aa, bb) The slowest run took 8.22 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 5.72 µs per loop In [122]: %timeit hamming_distance1(aa, bb) The slowest run took 12.67 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 2.77 µs per loop
Of course, if you always do this immediately before calculating the Hamming distance, the time required for the conversion should be included in the total time. However, if you write code that generates a and b to take advantage of numpy earlier, you may already have them in numpy arrays by the time you calculate the Hamming distance.
(I also experimented a bit with a two-dimensional array of pre-calculated Hamming distances between 8-bit values - an array with the form (256, 256), but the initialization cost is higher and the performance gain is small.)