Java cosine affinity problem

I developed some java program for calculating cosine similarity based on TF * IDF. He worked very well. But there is one problem...: (

for example: If I have two matrices and I want to calculate the cosine similarity, this does not work, because the rows are not the same in length

doc 1
1 2 3
4 5 6

doc 2
1 2 3 4 5 6
7 8 5 2 4 9

if the rows and columns are the same in length, then my program works very well, but this is not the case if the rows and columns do not have the same length.

Any advice ???

+3
source share
2 answers

I'm not sure about your implementation, but the cosine coefficient of two vectors is equal to the normalized point product of these vectors.

a. b = a T b. , , .

TF * IDF term, document, , , .

, , , . , , .

, , , , .

TF * IDF:

, TF * IDF term-document a. a i, j, i - , j - . , (, , 1). , i, j= f i, j * D/d i, f i, j is i doc j, D - , d i - i .

- , b. b i, q i q. b i, q= f i, q, f i, q - i q. , .

, . a, b Frobenius norm.

, -, b . , ( ) . b T a. , .

+3

java-

 static double cosine_similarity(Map<String, Double> v1, Map<String, Double> v2) {
            Set<String> both = Sets.newHashSet(v1.keySet());
            both.removeAll(v2.keySet());
            double sclar = 0, norm1 = 0, norm2 = 0;
            for (String k : both) sclar += v1.get(k) * v2.get(k);
            for (String k : v1.keySet()) norm1 += v1.get(k) * v1.get(k);
            for (String k : v2.keySet()) norm2 += v2.get(k) * v2.get(k);
            return sclar / Math.sqrt(norm1 * norm2);
    }
+3

Source: https://habr.com/ru/post/1734001/


All Articles