dHeight - logical . Inside the model this led to a coefficient, and the levels were sorted lexicographically (i.e., F before T).
As noted in @Hongooi's answer, you cannot evaluate 4 parameters, so R will match the conditions in the order they appear (FALSE to TRUE)
If you want to force R to TRUE , you can fit the model to !dHeight
lm(formula = Volume ~ Girth + cGirth:!dHeight, data = trees)
Note that !dHeightFALSE equivalent to dHeightTRUE
You will also notice that in this simple case, you simply change the sign to a coefficient, so it really doesn't matter which model suits you.
CHANGE FURTHER BEST APPROACH
R can recognize that cGirth and Girth are colinear, so we can fit, remembering that a/b expands to a + a:b
lm(formula = Volume ~ Girth + cGirth/dHeight, data = trees) Coefficients: (Intercept) Girth cGirth cGirth:dHeightTRUE -27.198 4.251 NA 1.286
This gives coefficients with easily interpretable names, and R wisely cannot return the coefficient for cGirth
R can say that Girth and cGirth are collinear when they are both “main effect” or autonomous terms.
There is no way that R could indicate when setting Girth + cGirth:dHeight that cGirth and Girth are collinear and provided dHeight is logical, we want cGirthdHeightTRUE be your coefficient, (you could write your own parser so that do it if you want)
another approach that would be consistent with the desired model, and without any collinear terms would be to use
lm(formula = Volume ~ Girth + I(cGirth*dHeight), data = trees)
which forces dHeight to be numeric ( TRUE becomes 1 ).
Edit the value of the operating point.
When you enter ~Girth + Girth:dHeight
What you are saying is that there is a main effect for Girth + settings for dHeight . R takes into account the first level of the factor of the control level. The slope for dHeightFALSE is just the value for Girth , then you have the setting for dHeight == TRUE (Girth: dHeightTRUE).
When you fit ~Girth + cGirth:dHeight - R does not have a parser that reads the mind, which can say that given cGirth and Girth are linear, when you fit into the interaction of these two terms, it will assume that the second level for dHeight now reference level)
Imagine if you had a variable that was not completely related to Girth
eg,
set.seed(1) trees$cG <- runif(nrow(trees))
Then, when you enter Girth + cG:dHeight , you will get 4 evaluated options
lm(formula = Volume ~ Girth + cG:dHeight, data = trees) Call: lm(formula = Volume ~ Girth + cG:dHeight, data = trees) Coefficients: (Intercept) Girth cG:dHeightFALSE cG:dHeightTRUE -31.79645 4.79435 -5.92168 0.09578
It is reasonable.
When R processes Girth + cGirth:dHeight , it will expand (first from the first level of the factor) 1 + Girth + cGirth:dHeightFALSE + cGirth:dHeightTRUE - and it will work that it cannot evaluate all 4 parameters and will evaluate the first 3.