I have a data set of 9 samples (rows) with 51608 variables (columns), and I get an error all the time when I try to scale it:
It works great
pca = prcomp(pca_data)
but
pca = prcomp(pca_data, scale = T)
gives
> Error in prcomp.default(pca_data, center = T, scale = T) : cannot rescale a constant/zero column to unit variance
Obviously, it's a little difficult to post a reproducible example. Any ideas what the deal is?
Search for constant columns:
sapply(1:ncol(pca_data), function(x){ length = unique(pca_data[, x]) %>% length }) %>% table
Output:
. 2 3 4 5 6 7 8 9 3892 4189 2124 1783 1622 2078 5179 30741
So there are no constant columns. Same thing with NA -
is.na(pca_data) %>% sum >[1] 0
This works great:
pca_data = scale(pca_data)
But then both still give the same error:
pca = prcomp(pca_data) pca = prcomp(pca_data, center = F, scale = F)
So why can't I get scaled information about this data? Ok, let's make it 100% sure that it is not permanent.
pca_data = pca_data + rnorm(nrow(pca_data) * ncol(pca_data))
The same mistakes. Numierc data?
sapply( 1:nrow(pca_data), function(row){ sapply(1:ncol(pca_data), function(column){ !is.numeric(pca_data[row, column]) }) } ) %>% sum
All the same mistakes. I have no ideas.
Edit: more and hack at least him.
Later, it is still not easy to put this data, for example:
Error in hclust(d, method = "ward.D") : NaN dissimilarity value in intermediate results.
The trim value under a specific cut, for example, 1, did not affect zero. What ultimately worked was cropping all the columns whose column had more than zero. Worked for # zeros <= 6, but 7+ gave errors. I don't know if this means that this is a problem at all or if it just happened to catch the problem column. However, it would be nice to hear if anyone has any ideas, because this should work fine if no variable is all zeros (or constants differently).