Independent Hash Offset Function

Is there any hash function that generates the same bucket for vectors that have the same elements, with the same relative positions, but shifted k times?

For instance:

hash([1,9,8,7]) -> b1 hash([9,8,7,1]) -> b1 hash([1,8,9,7]) -> b2 hash([1,9,8,5]) -> b3 

v1 = [1,9,8,7] v2 = [9,8,7,1] Both vectors should get the same hash since v2 v1 is shifted left k = 3 times.

But v3 = [1,8,9,7] does not preserve the same relative order, and v4 = [1,9,8,5] are different values, so none of them gets the hash b1.

My initial approach was to calculate the maximum value for each vector and consider its position as a reference (offset = 0). With this, I would need to shift each vector so that the maximum value is always in the first position. Thus, the shifted vectors will look the same. However, vectors can have repeating elements, and therefore, the maximum value has different positions.

+6
source share
6 answers
  • Find the lexicographically minimal rotation of the array.

    The proper way is to check all rotations in O (n 2 ), but this can be done in linear time using the Booth algorithm, the Schiloach fast canonization algorithm, or the Duval Lindon factorization algorithm.

    See more details.

  • Calculate the hash of the rotated array.

    This can be done in many ways. Java, for example, will do this as follows:

     hash = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] 

It is possible that arrays with different elements will hash to the same value (this is inevitable with hashing), but all rotations of the same array will have the same hash.

+3
source

If we connect b1 with ourselves, we get:

[1,9,8,7,1,9,8,7]

This array contains all the cyclic permutations of the original array.

If you then calculate the hash for each subarray of length 4 and combine and combine them, you will have a unique hash. It may take some optimization to calculate the hash function, depending on the size of your arrays.

EDIT: each subframe, except for the last, which is equal to the first!

+1
source

If you care about the random collision of hashes, you can simply take the sum of all elements as a hash (but be careful with floating point problems), since this is invariant to any rotation of the vector. Alternatively, you could xor or sum all the hashes of the individual elements. You can also calculate something based on the difference of subsequent elements (when traversing around the last element). Add a few of these properties that are invariant to rotation together, and the likelihood that two “unequal” arrays will produce the same hash will be quite low. Maybe something like

 n = length(x) rot_invariant_hash = hash(n) + sum(hash(x[i])) + sum(hash(x[mod(i+1, n)] - x[i])) 

where you can replace all the sums for any other commutative (?) operation, such as XOR. Also make sure that the hash function applied to the differences is not an identification function, or all of these parts will be added to zero. All this takes O (n) calculation time.

Just curiosity: what is your intentional application?

+1
source

Assuming you always have numbers in the form of vector components, calculate:

  • product of all components
  • product of all differences d_i adjacent components ( i , (i+1) mod n ), where 1 is added for all non-negative differences.

and multiply both.

the first product is abstracted from the order of the elements, which is reintroduced by the second rotation modulo the product. adding 1 to each difference avoids matching with 0 if there are two adjacent components of the same value.

a single first product is not enough, since it maps all component permutations to the same hash function value. an autonomous second product is not enough, since it compares all vectors shifted along (1, ..., 1) to the same value.

+1
source

Do not hash the elements of the array; instead, hash the differences of two neighboring cells:

 #include <stdio.h> unsigned hashdiff(unsigned arr[], size_t siz); /* toy hash function: don't try this at home ... */ #define HASH1(v) ((v)*7654321) unsigned hashdiff(unsigned arr[], size_t siz) { unsigned idx; unsigned hash; if (siz < 1) return 0; if (siz < 2) return HASH1(arr[0]); hash = HASH1( arr[0] - arr[siz-1] ); for(idx=1; idx < siz; idx++) { hash ^= HASH1(arr[idx] - arr[idx-1] ); } return hash; } unsigned arr1[] = {1,9,8,7}; unsigned arr2[] = {9,8,7,1 }; unsigned arr3[] = {1,8,9,7 }; unsigned arr4[] = {1,9,8,5 }; int main(void) { unsigned hash; hash = hashdiff (arr1, 4); printf("%x\n", hash); hash = hashdiff (arr2, 4); printf("%x\n", hash); hash = hashdiff (arr3, 4); printf("%x\n", hash); hash = hashdiff (arr4, 4); printf("%x\n", hash); return 0; } 

RESULT:

 ./a.out fee56452 fee56452 1100b22 fca02416 

UPDATE: if you do not want the {1,2,3,4} and {11,12,13,14} with a hash equal the same value, you could increase the difference as follows:

 #define HASH1(v) ((v)*7654321) #define HASH2(a,b) HASH1(3u*(a)-5u*(b)) unsigned hashdiff2(unsigned arr[], size_t siz) { unsigned idx; unsigned hash; if (siz < 1) return 0; if (siz < 2) return HASH1(arr[0]); hash = HASH2( arr[0] , arr[siz-1] ); for(idx=1; idx < siz; idx++) { hash ^= HASH2( arr[idx] , arr[idx-1] ); } return hash; } 
+1
source

I have not encoded it, but I think it can work:

To get your hash, you just need to capture the order of the elements and avoid the bias. Sort items as follows:

 a = [1,9,8,7] s = sort(a) = [1,7,8,9] 

Now write down the order between them:

 1 => 9 7 => 1 8 => 7 9 => 8 snext = next(s, a) = [9,1,7,8] 

Now concat s and snext:

 [1,7,8,9,9,1,7,8] 

And hash it.

To implement the next () function, simply use the vector a as an associative array and iterate over the elements s.

Array [9,8,7,1] will give the same hash, because it shares the same elements, and their relative order is equal.

However, the array [1,8,9,7] gives another hash; he shares the same elements, but their relative order does not match.

Hope this helps.

0
source

Source: https://habr.com/ru/post/952091/


All Articles