I have a huge dataset with words word_iand weights weight[i,j], where weight is the “strength of connection” between words.
I would like to binarize this data, but I want to know if there is any existing algorithm for creating the binary code of each word so that the Hamming distance between the word codes correlates with this weight.
Added:
The problem I'm working on is that I want to try to teach a neural network or SVM how to create associations between words. And that's why I decided to binarize the data. Do not ask why I do not want to use Markov models or just graphs, I tried them and I want to compare them with neural networks.
So,
I want my NN on the given word “a” to return its closest association or any given words and their probabilities,
I tried to just binarize and make "ab" as input and weight as the preferred answer, this worked poorly,
I was thinking about creating a threshold (for scales) to change another 1 bit. The smaller this threshold, the more bits you need,
I have a situation: a-> b w1; b-> a w2; w1 → w2; therefore, the direction is significant.
source
share