Built-in cosline smoothing function in matlab

I want to calculate the cosine similarity between different rows of a matrix in matlab. I wrote the following code in matlab:

for i = 1:n_row for j = i:n_row S2(i,j) = dot(S1(i,:), S1(j,:)) / (norm_r(i) * norm_r(j)); S2(j,i) = S2(i,j); 

S1 matrix is ​​11000 * 11000, and code execution is very laborious. So I want to know. Is there any function in Matlab for calculating the cosine similarity between matrix rows faster than the code above?

+7
source share
2 answers

Your code iterates over all the lines, and for each line a loop (about) half of the lines is executed, calculating the point product for each unique combination of lines:

 n_row = size(S1,1); norm_r = sqrt(sum(abs(S1).^2,2)); % same as norm(S1,2,'rows') S2 = zeros(n_row,n_row); for i = 1:n_row for j = i:n_row S2(i,j) = dot(S1(i,:), S1(j,:)) / (norm_r(i) * norm_r(j)); S2(j,i) = S2(i,j); end end 

(I took the liberty of completing my code so that it really executes. Pay attention to initializing S2 before the loop, it saves a lot of time!)

If you notice that a point product is a matrix product of a row vector with a column vector, you can see that the above, without the normalization step, is identical

 S2 = S1 * S1.'; 

This works much faster than an explicit loop, even if it (maybe?) Cannot use symmetry. Normalization is simply dividing each row by norm_r and each column by norm_r . Here I multiply two vectors to get a square matrix to normalize with:

 S2 = (S1 * S1.') ./ (norm_r * norm_r.'); 
+5
source

A short version calculating the similarities to pdist :

 S2 = squareform(1-pdist(S1,'cosine')) + eye(size(S1,1)); 

Explanation:

pdist(S1,'cosine') calculates the cosine distance between all string combinations in S1 . Therefore, the similarity between all combinations is 1 - pdist(S1,'cosine') .

We can turn this into a square matrix, where the element (i,j) matches the similarity of rows i and j with squareform(1-pdist(S1,'cosine')) .

Finally, we need to set the main diagonal of pdist 1, because the similarity of the line with itself is obviously 1, but this is not explicitly calculated by pdist .

+6
source

Source: https://habr.com/ru/post/1274507/


All Articles