I am trying to compute 2 main main components from a dataset in C ++ with Eigen.
I am currently doing this to normalize the data between [0, 1] and then center the average. After that, I calculate the covariance matrix and start decomposition into eigenvalues ββon it. I know that SVD is faster, but I am confused about the computing components.
Here is the basic code on how I do this (where traindata is my input matrix of size MxN):
Eigen::VectorXf normalize(Eigen::VectorXf vec) { for (int i = 0; i < vec.size(); i++) { // normalize each feature. vec[i] = (vec[i] - minCoeffs[i]) / scalingFactors[i]; } return vec; } // Calculate normalization coefficients (globals of type Eigen::VectorXf). maxCoeffs = traindata.colwise().maxCoeff(); minCoeffs = traindata.colwise().minCoeff(); scalingFactors = maxCoeffs - minCoeffs; // For each datapoint. for (int i = 0; i < traindata.rows(); i++) { // Normalize each datapoint. traindata.row(i) = normalize(traindata.row(i)); } // Mean centering data. Eigen::VectorXf featureMeans = traindata.colwise().mean(); Eigen::MatrixXf centered = traindata.rowwise() - featureMeans; // Compute the covariance matrix. Eigen::MatrixXf cov = centered.adjoint() * centered; cov = cov / (traindata.rows() - 1); Eigen::SelfAdjointEigenSolver<Eigen::MatrixXf> eig(cov); // Normalize eigenvalues to make them represent percentages. Eigen::VectorXf normalizedEigenValues = eig.eigenvalues() / eig.eigenvalues().sum(); // Get the two major eigenvectors and omit the others. Eigen::MatrixXf evecs = eig.eigenvectors(); Eigen::MatrixXf pcaTransform = evecs.rightCols(2); // Map the dataset in the new two dimensional space. traindata = traindata * pcaTransform;
The result of this code looks something like this:

To confirm my results, I tried the same with WEKA. So I did this to use normalization and a central filter in that order. Then filter the main component and save + build output. The result is the following:

Technically, I had to do the same, but the result is so different. Can anyone see if I made a mistake?
Chris source share