In the following example, let's say you have a model where supp
is a factor variable.
lm(len ~ dose + supp, data = ToothGrowth)
but I want to use a different base level for the factor. I could indicate this directly in the formula:
lm(len ~ dose + relevel(supp, "VC"), data = ToothGrowth)
and the output will be:
Call: lm(formula = len ~ dose + relevel(supp, "VC"), data = ToothGrowth) Coefficients: (Intercept) dose relevel(supp, "VC")OJ 5.573 9.764 3.700
It is very convenient to make transformations directly in the formula, and not create intermediate data sets or modify existing ones. For example, when you use scale
to standardize variables, where it is important to consider omissions in other variables included in the final model. Often, however, the resulting names of the output coefficients become quite ugly.
My question is: is it possible to specify the name of the variable that arises from the expression when working with the formula? Sort of
lm(len ~ dose + (OJ = relevel(supp, "VC")), data = Toothgrowth)
(which does not work).
EDIT: Although the solution proposed by G. Grothendieck is nice, it actually generates the wrong result. The following example shows this:
# Create some data: df <- data.frame(x1 = runif(10), x2=runif(10)) df <- transform(df, y = x1 + x2 + rnorm(10))
The problem is that when unifying x2, it uses observations that are not included in the final model, since x1 has gaps.
So, the question for me remains: is there a way for the formula interface to handle this case, without having the annoying intermediate step of using an additional formula and extracting a model frame, which can then be "transformed".
I hope the question is clear.