Stress Attribute - sklearn.manifold.MDS / Python

I use the scikit-learn MDS method to perform dimensional reduction on some data. I would like to check the voltage value in order to gain access to quality reduction. I was expecting something between 0 - 1. However, I got values ​​outside of this range. Here is a minimal example:

%matplotlib inline from sklearn.preprocessing import normalize from sklearn import manifold from matplotlib import pyplot as plt from matplotlib.lines import Line2D import numpy def similarity_measure(vec1, vec2): vec1_x = numpy.arctan2(vec1[1], vec1[0]) vec2_x = numpy.arctan2(vec2[1], vec2[0]) vec1_y = numpy.sqrt(numpy.sum(vec1[0] * vec1[0] + vec1[1] * vec1[1])) vec2_y = numpy.sqrt(numpy.sum(vec2[0] * vec2[0] + vec2[1] * vec2[1])) dot = numpy.sum(vec1_x * vec2_x + vec1_y * vec2_y) mag1 = numpy.sqrt(numpy.sum(vec1_x * vec1_x + vec1_y * vec1_y)) mag2 = numpy.sqrt(numpy.sum(vec2_x * vec2_x + vec2_y * vec2_y)) return dot / (mag1 * mag2) plt.figure(figsize=(15, 15)) delta = numpy.zeros((100, 100)) data_x = numpy.random.randint(0, 100, (100, 100)) data_y = numpy.random.randint(0, 100, (100, 100)) for j in range(100): for k in range(100): if j <= k: dist = similarity_measure((data_x[j].flatten(), data_y[j].flatten()), (data_x[k].flatten(), data_y[k].flatten())) delta[j, k] = delta[k, j] = dist delta = 1-((delta+1)/2) delta /= numpy.max(delta) mds = manifold.MDS(n_components=2, max_iter=3000, eps=1e-9, random_state=0, dissimilarity="precomputed", n_jobs=1) coords = mds.fit(delta).embedding_ print mds.stress_ plt.scatter(coords[:, 0], coords[:, 1], marker='x', s=50, edgecolor='None') plt.tight_layout() 

In my test, the following is printed:

+263,412196461

And produced this image:

enter image description here

How can I analyze this value without knowing the maximum value? Or how to normalize it so that it is between 0 and 1?

Thanks.

+5
source share
1 answer

This is because the current scikit-learn implementation calculates and returns the raw stress value (Οƒ r ), while you are expecting Stress-1 (Οƒ 1 ).

The first is not very informative (its high value does not necessarily indicate poor compliance), and the best way to convey reliability is to calculate the normalized voltage, for example. Stress-1, which according to Kruskal (1964, p. 3) has more or less the following interpretation: a value of 0 indicates an ideal fit, 0.025 is excellent, 0.05 is good, 0.1 is fair and 0.2 is bad.

I just calculated Stress-1 and sent PR . In the meantime, you can use the version from this branch where Stress-1 is used and returned instead of the original Stress value when normalize is set to True (Default is false).

For more information, see Kruskal (1964, pp. 8–9) or Borg and Groenen (2005, pp. 41–43).

+2
source

Source: https://habr.com/ru/post/1246470/


All Articles