Instead of fetching from the result of model.frame , you can na.omit(get_all_vars(myformula, Salaries)) from na.omit(get_all_vars(myformula, Salaries)) . So your example will be as follows:
myformula <- log(salary) ~ yrs.service + yrs.since.phd mfit <- lm(formula = myformula, data = Salaries) n <- nrow(Salaries) newdata <- na.omit(get_all_vars(myformula, Salaries))[sample(1:n, size=n, replace=TRUE),] mfit2 <- update(mfit, data = newdata)
We can use the following simple example to confirm that model.frame(myformula, df) and na.omit(get_all_vars(myformula, df)) select the same raw (non-transformed) data from the data frame:
df <- data.frame(w = rnorm(10), x = rnorm(10), y = rnorm(10), z = rnorm(10)) df[1, 1] <- NA df[2, 2] <- NA df[3, 3] <- NA df[4, 4] <- NA identical(data.frame(na.omit(get_all_vars(z ~ w + x, df))), data.frame(model.frame(z ~ w + x, df))) # [1] TRUE
Note that I wrapped the results of na.omit(get_all_vars(...)) and model.frame(...) in data.frame to remove third-party attributes for comparison. Of course, model.frame does extra work, such as salary conversion in your example. But if all you have to do is sample the original data, then na.omit(get_all_vars(...)) works fine, and then you can pass your new data frame to lm or update .
source share