Analysis of core components with its own library

Question

Analysis of core components with its own library

I am trying to compute 2 main main components from a dataset in C ++ with Eigen.

I am currently doing this to normalize the data between [0, 1] and then center the average. After that, I calculate the covariance matrix and start decomposition into eigenvalues on it. I know that SVD is faster, but I am confused about the computing components.

Here is the basic code on how I do this (where traindata is my input matrix of size MxN):

 Eigen::VectorXf normalize(Eigen::VectorXf vec) { for (int i = 0; i < vec.size(); i++) { // normalize each feature. vec[i] = (vec[i] - minCoeffs[i]) / scalingFactors[i]; } return vec; } // Calculate normalization coefficients (globals of type Eigen::VectorXf). maxCoeffs = traindata.colwise().maxCoeff(); minCoeffs = traindata.colwise().minCoeff(); scalingFactors = maxCoeffs - minCoeffs; // For each datapoint. for (int i = 0; i < traindata.rows(); i++) { // Normalize each datapoint. traindata.row(i) = normalize(traindata.row(i)); } // Mean centering data. Eigen::VectorXf featureMeans = traindata.colwise().mean(); Eigen::MatrixXf centered = traindata.rowwise() - featureMeans; // Compute the covariance matrix. Eigen::MatrixXf cov = centered.adjoint() * centered; cov = cov / (traindata.rows() - 1); Eigen::SelfAdjointEigenSolver<Eigen::MatrixXf> eig(cov); // Normalize eigenvalues to make them represent percentages. Eigen::VectorXf normalizedEigenValues = eig.eigenvalues() / eig.eigenvalues().sum(); // Get the two major eigenvectors and omit the others. Eigen::MatrixXf evecs = eig.eigenvectors(); Eigen::MatrixXf pcaTransform = evecs.rightCols(2); // Map the dataset in the new two dimensional space. traindata = traindata * pcaTransform;

The result of this code looks something like this:

To confirm my results, I tried the same with WEKA. So I did this to use normalization and a central filter in that order. Then filter the main component and save + build output. The result is the following:

Technically, I had to do the same, but the result is so different. Can anyone see if I made a mistake?

+5

c ++ pca eigen

Chris Nov 04 '15 at 20:36

source share

2 answers

When scaling to 0.1, you change the local variable vec , but forgot to update traindata .

Moreover, it is easier to do as follows:

 RowVectorXf minCoeffs = traindata.colwise().maxCoeff(); RowVectorXf minCoeffs = traindata.colwise().minCoeff(); RowVectorXf scalingFactors = maxCoeffs - minCoeffs; traindata = (traindata.rowwise()-minCoeffs).array().rowwise() / scalingFactors.array();

that is, using vector elements and array elements.

Let me also add that symmetric eigenvalue is actually faster than SVD. The true advantage of SVD in this case is that it avoids squaring records, but since your input is normalized and centered, and that you only care about the largest eigenvalues, there is no problem.

+4

ggael Nov 05 '15 at 9:23

source share

Chris · Accepted Answer · 2015-12-13T21:16:40+0000

The reason is that Weka standardized the data set. This means that it scales each variance of attributes to a variance of unity. When I did this, the plots looked the same. Technically, my approach was also correct.

Analysis of core components with its own library

More articles: