How to use the analysis of the main components in the classification problems of machine learning?

Question

How to use the analysis of the main components in the classification problems of machine learning?

I worked on concepts for analyzing core components in R.

I am comfortable applying the PCA to a dataset (say, labeled) and ultimately extracting the most interesting first few core components as numerical variables from my matrix.

The final question, in a way, now what? Most of the reading that I found on the PCA stops immediately after the calculations are done, especially regarding machine learning. Forgive my hyperbole, but I feel that everyone agrees that this technique is useful, but no one wants to use it when it does.

In particular, here is my real question:

I believe that the main components are linear combinations of the variables you started with. So, how does this transformed data play a role in supervised computer learning? How could someone use the PCA as a way to reduce the dimension of the data set, and THEN use these components with a student-controlled, say, SVM?

I am completely confused about what is happening with our labels. Once we are in our own space, great. But I see no way to continue moving forward with machine learning if this transformation destroys our classification concept (if there is no linear combination of “Yes” or “No”, I have not come across!)

Please come in and set me straight if you have the time and money. Thanks in advance.

+6

supervised-learning machine-learning pca principal-components

Matt O'Brien Nov 28 '13 at 2:37

source share

2 answers

Alex P. Miller · Answer 1 · 2016-11-19T17:43:22+0000

An old question, but I don’t think it was satisfactorily answered (and I just landed here myself through Google). I ended up in the same shoes and had to track down the answer myself.

The goal of the PCA is to represent your X data in an orthonormal basis of W; the coordinates of your data in this new basis are Z, as shown below:

$X = ZW '$

Due to orthonormalization, we can invert W simply, transfer it, and write:

$XW = Z$

Now, to reduce the dimension, we choose a certain number of components k <n. Assuming that our basis vectors in W are ordered from largest to smallest (i.e., the first vector corresponding to the largest eigenvalue is the first, etc.), It just holds the first k columns of W.

Now we have a k-dimensional representation of our training data X. Now you start some controlled classifier using new functions in Z.

The key is to understand that W in a sense is a canonical transformation from our space of functions p up to the space of k attributes (or, at least, the best transformation that we could find using our training data). Thus, we can hit our test data with the same W transformation, resulting in a k-dimensional set of test capabilities:

Now we can use the same classifier trained in the k-dimensional representation of our training data to make predictions for the k-dimensional representation of our test data:

The point of this whole procedure is that you can have thousands of functions, but (1) not all of them will have a significant signal and (2) your controlled learning method may be too complicated to learn on a full set of functions (or it will take too much time, or your computer will not have enough memory to process the calculations). PCA can significantly reduce the number of functions required to present your data, not excluding the possibility of your data that really add value.

Don reba · Answer 2 · 2013-11-28T04:49:37+0000

After you use the PCA on part of your data to calculate the transformation matrix, you apply this matrix to each of your data points before sending them to your classifier.

This is useful when the internal dimension of your data is much smaller than the number of components, and the performance gain you get during the classification is worth the loss of accuracy and cost of the PCA. Also, keep in mind the limitations of the PCA:

When performing a linear transformation, you implicitly assume that all components are expressed in equivalent units.
Besides variance, PCAs are blind to the structure of your data. It may very well happen that the data is split along sizes with small variances. In this case, the classifier will not learn from the converted data.

How to use the analysis of the main components in the classification problems of machine learning?

More articles: