Cosine of TSNE similarities in sklearn.manifold

Question

Cosine of TSNE similarities in sklearn.manifold

I have a little problem for doing TSNE in my dataset using cosine similarity.

I calculated the cosine similarity of all my vectors, so I have a square matrix that contains my similarity to cosine:

A = [[ 1 0.7 0.5 0.6 ] [ 0.7 1 0.3 0.4 ] [ 0.5 0.3 1 0.1 ] [ 0.6 0.4 0.1 1 ]]

Then I use TSNE as follows:

 A = np.matrix([[1, 0.7,0.5,0.6],[0.7,1,0.3,0.4],[0.5,0.3,1,0.1],[0.6,0.4,0.1,1]]) model = manifold.TSNE(metric="precomputed") Y = model.fit_transform(A)

But I'm not sure what to use a precalculated metric to understand the meaning of my cosine:

 #[documentation][1] If metric is "precomputed", X is assumed to be a distance matrix

But when I try to use the cosine metric, I got an error:

 A = np.matrix([[1, 0.7,0.5,0.6],[0.7,1,0.3,0.4],[0.5,0.3,1,0.1],[0.6,0.4,0.1,1]]) model = manifold.TSNE(metric="cosine") Y = model.fit_transform(A) raise ValueError("All distances should be positive, either " ValueError: All distances should be positive, either the metric or precomputed distances given as X are not correct

So my question is: how can TSNE be performed using the cosine metric on an existing dataset (similarity matrix)?

+5

scikit-learn cosine-similarity

Hugolastikot Apr 11 '16 at 9:58

source share

3 answers

an error is currently appearing. see here: https://github.com/scikit-learn/scikit-learn/issues/5772

however scikit t-sne uses a square Euclidean distance that is proportional to the cosine distance if your data is normalized to L2

+1

eyaler Sep 7 '16 at 12:46

source share

Can be done with sklearn pairwise_distances :

 from sklearn.manifold import TSNE from sklearn.metrics import pairwise_distances distance_matrix = pairwise_distances(X, X, metric='cosine', n_jobs=-1) model = TSNE(metric="precomputed") Xpr = model.fit_transform(distance_matrix)

The values in distance_matrix will be in the range [0,2] because (1 - [-1,1]) .

0

mrgloom Jan 30 '18 at 10:48

source share

ncfirth · Accepted Answer · 2016-04-11T13:01:29+0000

I can answer most of your questions, however I'm not quite sure why this error appears in the second example.

You calculated the cosine similarity of each of your vectors, but scikit accepts a distance matrix to enter TSNE. However, this really simple conversion distance = 1 is a similarity. So for your example

 import numpy as np from sklearn import manifold A = np.matrix([[1, 0.7,0.5,0.6],[0.7,1,0.3,0.4],[0.5,0.3,1,0.1],[0.6,0.4,0.1,1]]) A = 1.-A model = manifold.TSNE(metric="precomputed") Y = model.fit_transform(A)

This should give you the conversion you need.

Cosine of TSNE similarities in sklearn.manifold

More articles: