How to evaluate the technique of reducing the dimension?

I have an NxM dataset in binary form. I use various dimensional methods on it, and I draw the first two dimensions. This is how I get intuition whether the technique is suitable for my data set or not. Is there a more suitable / methodical / heuristic / formal way to test the suitability of the dimensional reduction methods that I use?

+4
source share
2 answers

The main purpose of applying data dimensionality reduction is to maximize capture the initial data distribution even after dimensionality reduction . Therefore, we want to make sure that we record the variance of the data as much as we can .

Say you have an N * N matrix, and we perform SVD ( Decomposition of a singular value ) on X. Then we observe singular values, diagonal entries in the resulting S-matrix. ( X = USV )

And you want to disable them at some index K based on the desired percent variance:

Σ i = 1 K sigma (i) / Σ i = 1 N sigma (i)

If you select the first K columns of U , then you reduce the original N-measure to K-dimension.

+1
source

You can use the SOM technique to be able to see multiple dullness in two dimensions. There are other methods, I will update the answer if I can remember their name, but I'm used to SOM.

You can find one good SOM toolkit for pressing the Matlab key.

This will help you visualize, but an evaluation should use a performance meter that measures what is important for downsizing (SOM itself can be used as a method of downsizing). What is important to compress data with minimal loss? Compress data as much as possible? Present data in a visible way? Perhaps you can measure the effectiveness of the methods without knowing how they changed the presentation of the data space, all you need is a good function to measure how good your technique is.

0
source

Source: https://habr.com/ru/post/1494171/


All Articles