Prcomp and ggbiplot: Invalid value for "rot"

I am trying to analyze my data using RA, and I found this nice guide using prcomp and ggbiplot . My data are two types of samples with three biological repetitions each (i.e. 6 rows) and about 20,000 genes (i.e. Variables). Firstly, getting a PCA model with the code described in the manual does not work:

 >pca=prcomp(data,center=T,scale.=T) Error in prcomp.default(data, center = T, scale. = T) : cannot rescale a constant/zero column to unit variance 

However, if I remove the scale. = T part scale. = T scale. = T , it works fine, and I get the model. Why is this, and is this the cause of the error below?

 > summary(pca) Importance of components: PC1 PC2 PC3 PC4 PC5 Standard deviation 4662.8657 3570.7164 2717.8351 1419.3137 819.15844 Proportion of Variance 0.4879 0.2861 0.1658 0.0452 0.01506 Cumulative Proportion 0.4879 0.7740 0.9397 0.9849 1.00000 

Secondly, the construction of the PCA. Even just using the base code, I get an error message and an empty image image:

 > ggbiplot(pca) Error: invalid 'rot' value 

What does this mean and how can I fix it? Does this have anything to do with the (un) scale when creating the ATP, or is it something else? In my opinion, this should be something with my data, because if I use the standard sample code (below), I get a very good PCA schedule.

 > data(wine) > wine.pca=prcomp(wine,scale.=T) > print(ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, groups = wine.class, ellipse = TRUE, circle = TRUE)) 

[EDIT 1] I tried a subset of my data in two ways: 1) delete all columns, all rows are 0, and 2) delete all columns were any rows equal to 0. The first subset still gives me scale but not the ones that deleted the columns with by any 0. Why is this? How does this affect my ATP?

In addition, I tried to execute the usual biplot command both for the source data (without scaling) and for the subsets of the data above, and it works in both cases. So what does this have to do with ggbiplot ?

[EDIT 2] I uploaded a subset of my data that gives me an error when I don't delete all zeros and works when I do this. I have not used gist before, but I think it is . Or this ...

+5
source share
1 answer

After migrating the data, I was able to replicate your error. The first mistake is the main problem. The PCA seeks to maximize the dispersion of each component, so it is important that it does not focus on only one variable, which can have a very high dispersion. First mistake:

 Error in prcomp.default(tdf, center = T, scale. = T) : cannot rescale a constant/zero column to unit variance 

This tells you that some of your variables have zero variance (i.e. not variability). Seeing how the PCA tries to group things together, maximizing variance, it makes no sense to hold these variables. They can be easily removed from the following call:

 df_f <- data[,apply(data, 2, var, na.rm=TRUE) != 0] 

Once you do this filter, the rest of the calls will work accordingly

 pca=prcomp(df_f,center=T,scale.=T) ggbiplot(pca) 
+6
source

Source: https://habr.com/ru/post/1207212/


All Articles