R - How to match code coefficients and save meaningful labels in the output summary

Well, once and for all, how are you (focus on you, because I'm sure there is more than one way to achieve this), contrast code (treatment, amount, helmert, etc.) and keep meaningful (so that you can make meaningful interpretations of effects) in the glm function?

I understand that I can use level () to figure out which factor level is the link, but this is tedious when I start attracting factors with 5 or 10 levels and their interactions.

Here is a quick two-factor example of what I mean

outcome <- c(1,0,0,1,1,0,0,0,1, 0, 0, 1) firstvar <- c("A", "B", "C", "C", "B", "B", "A", "A", "C", "A", "C", "B") secondvar <- c("D", "D", "E", "F", "F", "E", "D", "E", "F", "F", "D", "E") df <- as.data.frame(cbind(outcome, firstvar, secondvar)) df$firstvar <- as.factor(df$firstvar) df$secondvar <- as.factor(df$secondvar) #not coded manually (and default appears to be dummy or treatment coding) #gives meaningful factor labels in summary function summary(glm(outcome ~ firstvar*secondvar, data=df, family="binomial")) #effects coded #does not give meaningful factor labels contrasts(df$firstvar)=contr.sum(3) contrasts(df$secondvar)=contr.sum(3) summary(glm(outcome ~ firstvar*secondvar, data=df, family="binomial")) #dummy coded contrasts(df$firstvar)=contr.treatment(3); contrasts(df$secondvar)=contr.treatment(3); summary(glm(outcome ~ firstvar*secondvar, data=df, family="binomial")) 

Any suggestions would be appreciated. This problem bothered me from time to time, and I'm sure there is a simple solution (ish).

+6
source share
1 answer

Well, the simple answer (at least for contr.treatment ) is that you should pass factor levels to a function, not just the total. In most cases, this sets the level names correctly. for instance

 contr.treatment(levels(df$firstvar)) # BC # A 0 0 # B 1 0 # C 0 1 

and then R uses the column names as labels / suffixes for the coefficients in the regression summary. However, even when passing labels, contr.sum does not like to set column names. Here we can create our own shell.

 named.contr.sum<-function(x, ...) { if (is.factor(x)) { x <- levels(x) } else if (is.numeric(x) & length(x)==1L) { stop("cannot create names with integer value. Pass factor levels") } x<-contr.sum(x, ...) colnames(x) <- apply(x,2,function(x) paste(names(x[x>0]), names(x[x<0]), sep="-") ) x } 

Here we basically call contr.sum and just add the column names to the result (plus some error checking). You can run this with

 named.contr.sum(levels(df$firstvar)) # AC BC # A 1 0 # B 0 1 # C -1 -1 

I decided to use "AC" and "BC" as shortcuts, but you can change this in code if you want. Then do

 contrasts(df$firstvar)=named.contr.sum(levels(df$firstvar)) contrasts(df$secondvar)=named.contr.sum(levels(df$secondvar)) summary(glm(outcome ~ firstvar*secondvar, data=df, family="binomial")) 

will provide you

Call:

 glm(formula = outcome ~ firstvar * secondvar, family = "binomial", data = df) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.855e+00 5.023e+03 -0.001 0.999 firstvarA-C -6.855e+00 6.965e+03 -0.001 0.999 firstvarB-C 6.855e+00 6.965e+03 0.001 0.999 secondvarD-F -6.855e+00 6.965e+03 -0.001 0.999 secondvarE-F -6.855e+00 6.965e+03 -0.001 0.999 firstvarA-C:secondvarD-F 2.057e+01 8.473e+03 0.002 0.998 firstvarB-C:secondvarD-F -1.371e+01 1.033e+04 -0.001 0.999 firstvarA-C:secondvarE-F 7.072e-10 1.033e+04 0.000 1.000 firstvarB-C:secondvarE-F 6.855e+00 8.473e+03 0.001 0.999 
+4
source

Source: https://habr.com/ru/post/971609/


All Articles