Numpy.eig and variance percentage in PCA

Meeting with us, where we left ...

Therefore, I can use linalg.eig or linalg.svd to calculate PCA. Each of them returns different main components / eigenvectors and eigenvalues ​​when they receive the same data (I am currently using the Iris dataset).

Looking for here or any other PCA tutorial applied to the Iris dataset, I found that eigenvalues [2.9108 0.9212 0.1474 0.0206]. The method eiggives me a different set of eigenvalues ​​/ vectors for working with which I do not mind, except that these eigenvalues, once summed, are equal to the number of dimensions (4) and can be used to determine how each component contributes to the total variance .

Accepting the eigenvalues ​​returned linalg.eig, I cannot do this. For example, return values [9206.53059607 314.10307292 12.03601935 3.53031167]. The proportion of variance in this case will be [0.96542969 0.03293797 0.00126214 0.0003702]. This other page states that ("The proportion of variation explained by the component is its eigenvalue divided by the sum of the eigenvalues").

Since the variance explained by each measurement should be constant (I think), these proportions are incorrect. So, if I use the returned values, svd()which are the values ​​used in all the tutorials, I can get the correct percentage of variation from each dimension, but I wonder why the returned values eigcannot be used like that.

I assume that the returned results are still a valid way to project variables, so is there a way to convert them so that I can get the correct variance proportion explained by each variable? In other words, can I use the method eigand still have a fraction of the variance for each variable? In addition, can this comparison be done only in eigenvalues ​​so that I can have both real eigenvalues ​​and normalized ones?

Sorry for the long entry by the way. Here a (::)for getting it far. Assuming you have not just read this line.

+3
source share
4 answers

Doug , , :

def pca_eig(orig_data):
    data = array(orig_data)
    data = (data - data.mean(axis=0)) / data.std(axis=0)
    C = corrcoef(data, rowvar=0)
    w, v = linalg.eig(C)
    print "Using numpy.linalg.eig"
    print w
    print v

def pca_svd(orig_data):
    data = array(orig_data)
    data = (data - data.mean(axis=0)) / data.std(axis=0)
    C = corrcoef(data, rowvar=0)
    u, s, v = linalg.svd(C)
    print "Using numpy.linalg.svd"
    print u
    print s
    print v

:

Using numpy.linalg.eig
[ 2.91081808  0.92122093  0.14735328  0.02060771]
[[ 0.52237162 -0.37231836 -0.72101681  0.26199559]
 [-0.26335492 -0.92555649  0.24203288 -0.12413481]
 [ 0.58125401 -0.02109478  0.14089226 -0.80115427]
 [ 0.56561105 -0.06541577  0.6338014   0.52354627]]

Using numpy.linalg.svd
[[-0.52237162 -0.37231836  0.72101681  0.26199559]
 [ 0.26335492 -0.92555649 -0.24203288 -0.12413481]
 [-0.58125401 -0.02109478 -0.14089226 -0.80115427]
 [-0.56561105 -0.06541577 -0.6338014   0.52354627]]
[ 2.91081808  0.92122093  0.14735328  0.02060771]
[[-0.52237162  0.26335492 -0.58125401 -0.56561105]
 [-0.37231836 -0.92555649 -0.02109478 -0.06541577]
 [ 0.72101681 -0.24203288 -0.14089226 -0.6338014 ]
 [ 0.26199559 -0.12413481 -0.80115427  0.52354627]]

.

+4

, ( ?)? , , , ;)

0

, , PCA: , . , linalg.eig . , . - stats.stackexchange.com. math.stackexchange.com .:)

0

I would suggest using SVD, decomposition of singular values, for PCA, because
1) it gives you directly the values ​​and matrices that you need
2) it is reliable.
See principal-component-analysis-in-python on SO for an example (surprise) of iris data. Running it gives

read iris.csv: (150, 4)
Center -= A.mean: [ 5.84  3.05  3.76  1.2 ]
Center /= A.std: [ 0.83  0.43  1.76  0.76]

SVD: A (150, 4) -> U (150, 4)  x  d diagonal  x  Vt (4, 4)
d^2: 437 138 22.1 3.09
% variance: [  72.77   95.8    99.48  100.  ]
PC 0 weights: [ 0.52 -0.26  0.58  0.57]
PC 1 weights: [-0.37 -0.93 -0.02 -0.07]

You see that the diagonal matrix d from SVD, the square, gives a fraction of the total deviation from PC 0, PC 1 ...

Does it help?

0
source

Source: https://habr.com/ru/post/1788356/


All Articles