Cosine of TSNE similarities in sklearn.manifold

I have a little problem for doing TSNE in my dataset using cosine similarity.

I calculated the cosine similarity of all my vectors, so I have a square matrix that contains my similarity to cosine:

A = [[ 1 0.7 0.5 0.6 ] [ 0.7 1 0.3 0.4 ] [ 0.5 0.3 1 0.1 ] [ 0.6 0.4 0.1 1 ]] 

Then I use TSNE as follows:

 A = np.matrix([[1, 0.7,0.5,0.6],[0.7,1,0.3,0.4],[0.5,0.3,1,0.1],[0.6,0.4,0.1,1]]) model = manifold.TSNE(metric="precomputed") Y = model.fit_transform(A) 

But I'm not sure what to use a precalculated metric to understand the meaning of my cosine:

 #[documentation][1] If metric is "precomputed", X is assumed to be a distance matrix 

But when I try to use the cosine metric, I got an error:

 A = np.matrix([[1, 0.7,0.5,0.6],[0.7,1,0.3,0.4],[0.5,0.3,1,0.1],[0.6,0.4,0.1,1]]) model = manifold.TSNE(metric="cosine") Y = model.fit_transform(A) raise ValueError("All distances should be positive, either " ValueError: All distances should be positive, either the metric or precomputed distances given as X are not correct 

So my question is: how can TSNE be performed using the cosine metric on an existing dataset (similarity matrix)?

+5
source share
3 answers

I can answer most of your questions, however I'm not quite sure why this error appears in the second example.

You calculated the cosine similarity of each of your vectors, but scikit accepts a distance matrix to enter TSNE. However, this really simple conversion distance = 1 is a similarity. So for your example

 import numpy as np from sklearn import manifold A = np.matrix([[1, 0.7,0.5,0.6],[0.7,1,0.3,0.4],[0.5,0.3,1,0.1],[0.6,0.4,0.1,1]]) A = 1.-A model = manifold.TSNE(metric="precomputed") Y = model.fit_transform(A) 

This should give you the conversion you need.

+5
source

an error is currently appearing. see here: https://github.com/scikit-learn/scikit-learn/issues/5772

however scikit t-sne uses a square Euclidean distance that is proportional to the cosine distance if your data is normalized to L2

+1
source

Can be done with sklearn pairwise_distances :

 from sklearn.manifold import TSNE from sklearn.metrics import pairwise_distances distance_matrix = pairwise_distances(X, X, metric='cosine', n_jobs=-1) model = TSNE(metric="precomputed") Xpr = model.fit_transform(distance_matrix) 

The values ​​in distance_matrix will be in the range [0,2] because (1 - [-1,1]) .

0
source

Source: https://habr.com/ru/post/1246862/


All Articles