Extension of factor interactions in the formula

I have many formulas (of the formula or formula class) of the form y ~ a*b , where a and b are factors.

I need to write a function that takes such a formula, and returns a formula with all the members in the "spelled out" relationship. Here is an example:

 fac1 <- factor(c('a', 'a', 'b', 'b')) fac2 <- factor(c('c', 'd', 'c', 'd')) BigFormula(formula(x ~ fac1*fac2)) 

where BigFormula returns a formula(x ~ a + b + c + d + a:c + a:d + b:c + b:d) .

Is there an easy way to do this?

(Context: I run many commands of the form anova(mod1, mod2) , where mod2 nests in mod1 , and where the right-hand side of both models contains terms like fac1*fac2 . The command is to calculate the F statistics. The problem is that anova considers fac1*fac2 as three variables, although it usually represents more than three variables. (In the above code, for example, fac1*fac2 represents eight variables.) As a result, anova underestimates the number of constraints in the nested model and overestimates my degrees of freedom.)

+6
source share
4 answers

How about the next solution. I use a more extreme example of complex interaction.

f = formula(y ~ a * b * c * d * e)

To state the terms of interaction, we extract the terms from the value returned by terms.formula ():

terms = attr(terms.formula(f), "term.labels")

which gives:

 > terms [1] "a" "b" "c" "d" "e" "a:b" "a:c" [8] "b:c" "a:d" "b:d" "c:d" "a:e" "b:e" "c:e" [15] "d:e" "a:b:c" "a:b:d" "a:c:d" "b:c:d" "a:b:e" "a:c:e" [22] "b:c:e" "a:d:e" "b:d:e" "c:d:e" "a:b:c:d" "a:b:c:e" "a:b:d:e" [29] "a:c:d:e" "b:c:d:e" "a:b:c:d:e" 

And then we can convert it to a formula:

f = as.formula(sprintf("y ~ %s", paste(terms, collapse="+")))

 > f y ~ a + b + c + d + e + a:b + a:c + b:c + a:d + b:d + c:d + a:e + b:e + c:e + d:e + a:b:c + a:b:d + a:c:d + b:c:d + a:b:e + a:c:e + b:c:e + a:d:e + b:d:e + c:d:e + a:b:c:d + a:b:c:e + a:b:d:e + a:c:d:e + b:c:d:e + a:b:c:d:e 
+2
source

Take a look at the help formula may be things that will work for you.

For example, the formula y ~ (a + b + c + d)^2 will give you all the main effects and all two-way interactions, and the formula y ~ (a + b) * (c + d) give the extension that you show above. You can also subtract the terms, so y ~ a*b*c - a:b:c will not include tripartite interaction.

+7
source

I have yet to learn all the tricks of the formula, but if I need explicit formulas, I will use sapply along with the insert:

 # the factors fac1 <- factor(c('a', 'a', 'b', 'b')) fac2 <- factor(c('c', 'd', 'c', 'd')) # create all the interaction terms out <- sapply(levels(fac1), function(ii) { sapply(levels(fac2), function(jj) { paste0(ii,":",jj) }) }) # along with the single terms terms <- c(levels(fac1), levels(fac2), as.vector(out)) # and create the rhs of the formula rhs <- paste0(terms, collapse=" + ") # finally add the lhs f <- paste0("x ~ ", rhs) 

As a result, we get:

 > f [1] "x ~ a + b + c + d + a:c + a:d + b:c + b:d" 
+3
source

We had a similar problem, but a little easier - in the formula we got 50 variables, and we had to change them very often; our solution was to loop them into an external file inside the R script, making the actual formula, and then just read this txt file and paste it; as far as I remember, this could be done in a nested loop to make more formulas, and then read the file line by line; in general, it is always useful to use both R-scripts and bash

0
source

Source: https://habr.com/ru/post/921036/


All Articles