R: Extracting full cases / included cases from a linear model or formula variables

After running m1 <- lm(f1, data=DT) I want to save the included observations (similar to "obs <- complete.cases (m1), but something that works) so that I can perform a second regression with those the same observations: m2 <- lm(f2, data=DT[obs]) .

Alternatively, I would like to get observations that are complete for a given set of variables, as defined by the formula object. Consider this R-like pseudocode:

 f1 <- as.formula("y ~ x1 + x2 + x3") f2 <- as.formula("y ~ x1 + x2") obs <- complete.cases(DT[,list(all.vars(f1)]) m2 <- lm(f2, data=DT[obs]) 

How can I do it? In the first case, lm already doing the work implicitly; how can i extract it? In the second, all.vars returns a character vector; how to create a list without quotes that understands DT (data.table)?

+6
source share
2 answers

From data.table v1.9.5 , na.omit has an argument cols .

 na.omit(DT, cols = all.vars(f)) 
+4
source

Assuming na.action of your lm () call is the default na.omit, why not just call na.omit on the source data?

 # create some missing values mtcars$disp <- ifelse(runif(nrow(mtcars)) > 0.8, NA, mtcars$disp) # fit model m1 <- lm(mpg ~ disp, data = mtcars) na.omit(mtcars[ , c("mpg", "disp")]) 

Check out the help file for na.omit for alternatives.

0
source

Source: https://habr.com/ru/post/983478/


All Articles