Stress Attribute - sklearn.manifold.MDS / Python

Question

Stress Attribute - sklearn.manifold.MDS / Python

I use the scikit-learn MDS method to perform dimensional reduction on some data. I would like to check the voltage value in order to gain access to quality reduction. I was expecting something between 0 - 1. However, I got values outside of this range. Here is a minimal example:

%matplotlib inline from sklearn.preprocessing import normalize from sklearn import manifold from matplotlib import pyplot as plt from matplotlib.lines import Line2D import numpy def similarity_measure(vec1, vec2): vec1_x = numpy.arctan2(vec1[1], vec1[0]) vec2_x = numpy.arctan2(vec2[1], vec2[0]) vec1_y = numpy.sqrt(numpy.sum(vec1[0] * vec1[0] + vec1[1] * vec1[1])) vec2_y = numpy.sqrt(numpy.sum(vec2[0] * vec2[0] + vec2[1] * vec2[1])) dot = numpy.sum(vec1_x * vec2_x + vec1_y * vec2_y) mag1 = numpy.sqrt(numpy.sum(vec1_x * vec1_x + vec1_y * vec1_y)) mag2 = numpy.sqrt(numpy.sum(vec2_x * vec2_x + vec2_y * vec2_y)) return dot / (mag1 * mag2) plt.figure(figsize=(15, 15)) delta = numpy.zeros((100, 100)) data_x = numpy.random.randint(0, 100, (100, 100)) data_y = numpy.random.randint(0, 100, (100, 100)) for j in range(100): for k in range(100): if j <= k: dist = similarity_measure((data_x[j].flatten(), data_y[j].flatten()), (data_x[k].flatten(), data_y[k].flatten())) delta[j, k] = delta[k, j] = dist delta = 1-((delta+1)/2) delta /= numpy.max(delta) mds = manifold.MDS(n_components=2, max_iter=3000, eps=1e-9, random_state=0, dissimilarity="precomputed", n_jobs=1) coords = mds.fit(delta).embedding_ print mds.stress_ plt.scatter(coords[:, 0], coords[:, 1], marker='x', s=50, edgecolor='None') plt.tight_layout()

In my test, the following is printed:

+263,412196461

And produced this image:

How can I analyze this value without knowing the maximum value? Or how to normalize it so that it is between 0 and 1?

Thanks.

+5

python scikit-learn machine-learning stress-testing mds

pceccon Apr 05 '16 at 13:45

source share

1 answer

Łukasz Borchmann · Accepted Answer · 2017-11-26T21:20:15+0000

This is because the current scikit-learn implementation calculates and returns the raw stress value (σ _r ), while you are expecting Stress-1 (σ ₁ ).

The first is not very informative (its high value does not necessarily indicate poor compliance), and the best way to convey reliability is to calculate the normalized voltage, for example. Stress-1, which according to Kruskal (1964, p. 3) has more or less the following interpretation: a value of 0 indicates an ideal fit, 0.025 is excellent, 0.05 is good, 0.1 is fair and 0.2 is bad.

I just calculated Stress-1 and sent PR . In the meantime, you can use the version from this branch where Stress-1 is used and returned instead of the original Stress value when normalize is set to True (Default is false).

For more information, see Kruskal (1964, pp. 8–9) or Borg and Groenen (2005, pp. 41–43).

Stress Attribute - sklearn.manifold.MDS / Python

More articles: