Intentional Hashing Collisions

I am trying to write code that will do "fuzzy hashing". That is: I want several hash inputs to be on the same output so that I can quickly and easily search, etc. If A hashes to 1 and C hashes to 1, it will be trivial for me to know that A is equivalent to C.

Creating such a hash function seems difficult, so I was wondering if anyone had experience with CMPH or GPERF, and could go through creating a function that would lead to this hash function.

Thanks in advance! Stephen

@Ben

In this case, the matrices are boolean, but I can easily pack them into 64-bit integers. Rotations, translations, etc. The input does not matter and should be screened out. Thus:

000
111
000

Is equivalent

111
000
000

and

001
001
001

(simplification)

@Kinopiko

- "" , , (... ). , . .

@Jason

.

000
010
000

000
011
000
+3
5

SOUNDEX , ...

  • ,
  • , A (, "McDonald" ), , , C ( "MacDonnel" )

SOUNDEX , ( , ) , [ , ].

( ?) , ( , ) [ ] , "" /. , , "" , , , ( ), , , , ( ).

, , , - ( ... , ...), .

. , , - , "on" .

CMPH gperf...
, mimimal, -. . , ( )

+2

MinHash, , .

, -, , , , .

+1

( )?

:

  • , ?
  • / . ?
  • 1 2.

, Java ( , ):

import java.util. *;

/**
 *
 * @author Mark Bolusmjak
 */
public class MatrixTest {


  LinkedList<LinkedList<Integer>> randomMatrix(int size){
    LinkedList<LinkedList<Integer>> rows = new LinkedList<LinkedList<Integer>>();
    for (int i=0; i<size; i++){
      LinkedList<Integer> newRow = new LinkedList<Integer>();
      for (int j=0; j<size; j++){
        newRow.add((int)(5*Math.random()));
      }
      rows.add(newRow);
    }
    return rows;
  }

  LinkedList<LinkedList<Integer>> trans(LinkedList<LinkedList<Integer>> m){
    if (Math.random()<0.5){ //column translation
      for (LinkedList<Integer> integers : m) {
        integers.addFirst(integers.removeLast());
      }
    } else { //row translation
      m.addFirst(m.removeLast());
    }
    return m;
  }

  LinkedList<LinkedList<Integer>> flipDiagonal(LinkedList<LinkedList<Integer>> m){
    LinkedList<LinkedList<Integer>> flipped = new LinkedList<LinkedList<Integer>>();
    for (int i=0; i<m.size(); i++){
      flipped.add(new LinkedList<Integer>());
    }

    for (LinkedList<Integer> mRows : m) {
      Iterator<Integer> listIterator = mRows.iterator();
      for (LinkedList<Integer> flippedRows : flipped) {
        flippedRows.add(listIterator.next());
      }
    }
    return flipped;
  }


  public static void main(String[] args) {
    MatrixTest mt = new MatrixTest();
    LinkedList<LinkedList<Integer>> m = mt.randomMatrix(4);
    mt.display(m);

    System.out.println(mt.hash1(m));
    System.out.println(mt.hash2(m));

    m = mt.trans(m);
    mt.display(m);
    System.out.println(mt.hash1(m));
    System.out.println(mt.hash2(m));

    m = mt.flipDiagonal(m);
    mt.display(m);
    System.out.println(mt.hash1(m));
    System.out.println(mt.hash2(m));

    m = mt.trans(m);
    mt.display(m);
    System.out.println(mt.hash1(m));
    System.out.println(mt.hash2(m));

    m = mt.flipDiagonal(m);
    mt.display(m);
    System.out.println(mt.hash1(m));
    System.out.println(mt.hash2(m));

  }


  private void display(LinkedList<LinkedList<Integer>> m){
    for (LinkedList<Integer> integers : m) {
      System.out.println(integers);
    }
    System.out.println("");
  }

  int hash1(LinkedList<LinkedList<Integer>> m){
    ArrayList<Integer> sorted = new ArrayList<Integer>();

    for (LinkedList<Integer> integers : m) {
      for (Integer integer : integers) {
        sorted.add(integer);
      }
    }
    Collections.sort(sorted);
    return sorted.hashCode();
  }

  int hash2(LinkedList<LinkedList<Integer>> m){
    List<Integer> rowColumnHashes = new ArrayList<Integer>();
    for (LinkedList<Integer> row : m) {
      int hash = 0;
      for (Integer integer : row) {
        hash += integer;
      }
      rowColumnHashes.add(hash);
    }

    m = flipDiagonal(m);
    for (LinkedList<Integer> row : m) {
      int hash = 0;
      for (Integer integer : row) {
        hash += integer;
      }
      rowColumnHashes.add(hash);
    }

    Collections.sort(rowColumnHashes);
    return rowColumnHashes.hashCode();
  }



} // end of class
0

, .., , , , . ?

, , - - , , , (, ), .. , , .

, , , - Vector Calc , .

0

chd, cmph, k- , , , , :

http://cmph.sourceforge.net/chd.html

However, it would be better to know what you mean by "big input." If you are talking about hundreds of thousands of records, there are simpler solutions. If you have hundreds of millions, then chd is probably the best choice.

0
source

Source: https://habr.com/ru/post/1722007/


All Articles