so what are you doing
k_0 = h_1 mod s k_1 = h_1 + h_2 mod s = k_0 + h_2 mod s k_2 = h_1 + h_2 + h_2 mod s = k_1 + h_2 mod s .. k_n = k_(n-1) + h_2 mod s
Depending on overflow problems (which should not differ from the original if the size is less than half 2**64
), this may be faster (less easy to parallelize):
uint64_t h_one = hash[0]; uint64_t h_two = hash[1]; k_hash[0] = h_one % size; for ( int i=1; i<k; ++i ) { (uint64_t *) k_hash[i] = ( k_hash[i-1] + h_two ) % size; }
Please note that there is a possibility that your compiler has already come to this form, depending on which optimization flags you use.
Of course, this excluded only one multiplication. If you want to eliminate or reduce the module, I think that based on h_two%size
and h_1%size
you can predefine the steps that you should explicitly call %size
, something like this:
uint64_t h_one = hash[0]%size; uint64_t h_two = hash[1]%size; k_hash[0] = h_one; step = (size-(h_one))/(h_two)-1; for ( int i=1; i<k; ++i ) { (uint64_t *) k_hash[i] = ( k_hash[i-1] + h_two ); if(i==step) { k_hash[i] %= size; } }
Note. I am not sure about the formula (I have not tested it), this is a more general idea. This will greatly depend on how good your prediction is in your branch (and how big a performance indicator is a wrong prediction). In addition, it can only help if the step is large.
change: or simpler (and possibly with the same performance) means Mystical:
uint64_t h_one = hash[0]%size; uint64_t h_two = hash[1]%size; k_hash[0] = h_one; for ( int i=1; i<k; ++i ) { (uint64_t *) k_hash[i] = ( k_hash[i-1] + h_two ); if(k_hash[i] > size) { k_hash[i] -= size; } }
source share