Regress each column in a data frame on a vector in R

I want to regress each column in a dataset on a vector, and then return the column with the highest R-squared value. for example I have a HAPPY vector <- (3,2,2,3,1,3,1,3) and I have a dataset.

HEALTH CONINC MARITAL SATJOB1 MARITAL2 HAPPY 3 441 5 1 2 3 1 1764 5 1 2 2 2 3087 5 1 2 2 3 3087 5 1 2 3 1 3969 2 1 5 1 1 3969 5 1 2 3 2 4852 5 1 2 2 3 5734 3 1 3 3 

On each column in the dataset on the left, click Cope, and then return the column with the highest R-square. Example: lm (Health ~ Happy), if health had the highest R-squared value, then return to working capacity.

I tried applying, but can't figure out how to return regression with the highest R-square. Any suggestions?

+6
source share
3 answers

This will do what you want if your data.frame is called 'd'

 r2s <- apply(d, 2, function(x) summary(lm(x ~ HAPPY))$r.squared) names(d)[which.max(r2s)] 

You can learn how to extract model components or, in this case, a model summary, using the str () command. He will give you information that will help you access the components of any complex object.

+4
source

I would break it into two steps:

1) Define R-squares for each model

2) Determine what is the highest value

 mydf<-data.frame(aa=rpois(8,4),bb=rpois(8,2),cc=rbinom(8,1,.5), happy=c(3,2,2,3,1,3,1,3)) myRes<-sapply(mydf[-ncol(mydf)],function(x){ mylm<-lm(x~mydf$happy) theR2<-summary(mylm)$r.squared return(theR2) }) names(myRes[which(myRes==max(myRes))]) 

It was assumed that happy is in your data.frame.

+5
source

Here we use a solution using the colwise() function from the plyr package.

 library(plyr) df = data.frame(a = runif(10), b=runif(10), c=runif(10), d = runif(10)) Rsq = function(x) summary(lm(df$a ~ x))$r.squared Rsqall = colwise(Rsq)(df[, 2:4]) Rsqall names(Rsqall)[which.max(Rsqall)] 
+1
source

Source: https://habr.com/ru/post/913699/


All Articles