I have a large dataset with 100 variables and 3000 observations. I want to find those variables (columns) that are strongly correlated or redundant and therefore remove the dimension in the data frame. I tried this, but it only calculates the correlation between one column and the other; and I always get an error
for(i in 1:ncol(predicteurs)){
correlations <- cor(predicteurs[,i],predicteurs[,2])
names(correlations[which.max(abs(correlations))])
}
Warning messages:
1: In cor(predicteurs[, i], predicteurs[, 2]) :
the standard deviation is zero
2: In cor(predicteurs[, i], predicteurs[, 2]) :
the standard deviation is zero
Can anybody help me?
source
share