Similarity Matrix & # 8594; feature vector algorithms?

If we have a set of M words, and we know in advance the similarity of the meaning of each pair of words (we have a matrix of M x M similarities), which algorithm can we use to create one k-dimensional bit vector for each word, so that each pair of words can compare only by comparing their vectors (for example, obtaining the absolute difference of the vectors)?

I do not know what this particular problem is called. If I knew, it would be much easier to find among many algorithms with similar descriptions that do something else.


Additional observation:

I think that this algorithm has to create a side effect in this case. If from the matrix the word A is similar to the word B, and B is similar to C, but a low detection [A, C] is detected, the difference in the calculated result vectors should also lead to a high [A, C] similarity. Thus, we would fill in the previous gaps in the matrix - somehow smooth out the similarities with this algorithm. But besides this smoothing, the goal is to have as close as possible results to the initial numbers that we had in the matrix.

+6
source share
2 answers

You can do a truncated singular value decomposition (SVD) to find the best approximation of the k-rank to the matrix. The idea is to decompose the matrix into three matrices: U, sigma and V such that U and V are orthonormal and the sigma is diagonal.

Truncating unimportant singular values, you can reach the storage location O(k*m) .

+7
source

If you are only interested in the first eigenvector + eigenvalue, you may need an iteration of power. I once used it to extract keywords from text documents. (based on interlayer distance in sentences, but similarities are likely to work too)

0
source

Source: https://habr.com/ru/post/899103/


All Articles