Replace in r with anova

I tried running anova on different datasets and didn't quite know how to do this. I staggered and found this useful: http://www.ats.ucla.edu/stat/r/pages/looping_strings.htm

hsb2 <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv") names(hsb2) varlist <- names(hsb2)[8:11] models <- lapply(varlist, function(x) { lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2) }) 

My understanding of what the previous codes do is to create the lm () function and apply it to each variable in varlist, and it does a linear regression for each of them.

So I thought using aov instead of lm would work for me like this:

 aov(substitute(read ~ i, list(i = as.name(x))), data = hsb2) 

However, I got this error:

 Error in terms.default(formula, "Error", data = data) : no terms component nor attribute 

I have no idea where the error comes from. Please, help!

+5
source share
3 answers

The problem is that substitute() returns an expression, not a formula. I think @thelatemail suggestion

 lm(as.formula(paste("read ~",x)), data = hsb2) 

- good job. Alternatively, you can evaluate the expression to get the formula with

 models <- lapply(varlist, function(x) { aov(eval(substitute(read ~ i, list(i = as.name(x)))), data = hsb2) }) 

I think it depends on what you want to do with the list of models later. Performance

 models <- lapply(varlist, function(x) { eval(bquote(aov(read ~ .(as.name(x)), data = hsb2))) }) 

gives a cleaner call property for each result.

+5
source

That should do it. The varlist vector will be passed element by element to the function, and the column will be delivered. The lm function will only see a two-frame data frame, and the "read" column will be a dependent variable each time. No need for submenu:

 models <- sapply(varlist, function(x) { lm(read ~ ., data = hsb2[, c("read", x) ]) }, simplify=FALSE) > summary(models[[1]]) # The first model. Note the use of "[[" Call: lm(formula = read ~ ., data = hsb2[, c("read", x)]) Residuals: Min 1Q Median 3Q Max -19.8565 -5.8976 -0.8565 5.5801 24.2703 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 18.16215 3.30716 5.492 1.21e-07 *** write 0.64553 0.06168 10.465 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 8.248 on 198 degrees of freedom Multiple R-squared: 0.3561, Adjusted R-squared: 0.3529 F-statistic: 109.5 on 1 and 198 DF, p-value: < 2.2e-16 

For all models ::

 lapply(models, summary) 
+5
source

akrun borrowed my answer that night, now I (partially) occupy it.

do.call places the variables in the call output so that it do.call correctly. Here's a common feature for simple regression.

 doModel <- function(col1, col2, data = hsb2, FUNC = "lm") { form <- as.formula(paste(col1, "~", col2)) do.call(FUNC, list(form, substitute(data))) } lapply(varlist, doModel, col1 = "read") # [[1]] # # Call: # lm(formula = read ~ write, data = hsb2) # # Coefficients: # (Intercept) write # 18.1622 0.6455 # # # [[2]] # # Call: # lm(formula = read ~ math, data = hsb2) # # Coefficients: # (Intercept) math # 14.0725 0.7248 # # ... # ... # ... 

Note. As stated in his comment

 sapply(varlist, doModel, col1 = "read", simplify = FALSE) 

saves the names in a list, and also resolves a subset of list$name .

+4
source

Source: https://habr.com/ru/post/1203187/


All Articles