Dummy variables for logistic regression in R

Question

Dummy variables for logistic regression in R

I perform logistic regression on three factors, which are all binary.

My details

table1<-expand.grid(Crime=factor(c("Shoplifting","Other Theft Acts")),Gender=factor(c("Men","Women")), Priorconv=factor(c("N","P"))) table1<-data.frame(table1,Yes=c(24,52,48,22,17,60,15,4),No=c(1,9,3,2,6,34,6,3))

and model

 fit4<-glm(cbind(Yes,No)~Priorconv+Crime+Priorconv:Crime,data=table1,family=binomial) summary(fit4)

R seems to take 1 for P's previous criminal record and 1 for shoplifting. As a result, the interaction effect is only 1, if both of them are 1. I would like to try different combinations for the term interaction, for example, I would like to see what would happen if preliminary conviction is P and crime is not theft.

Is there a way to get R to accept different cases for 1s and 0s? This would greatly facilitate my analysis.

Thanks.

+1

r statistics regression logistic-regression

Johnk Nov 06 '14 at 15:45

source share

2 answers

I agree with the interpretation provided by @ eipi10. You can also use relevel to change the reference level before setting the model:

 levels(table1$Priorconv) ## [1] "N" "P" table1$Priorconv <- relevel(table1$Priorconv, ref = "P") levels(table1$Priorconv) ## [1] "P" "N" m <- glm(cbind(Yes, No) ~ Priorconv*Crime, data = table1, family = binomial) summary(m)

Note that I changed the formula glm() argument to include Priorconv*Crime , which is more compact.

+1

davechilders Nov 06 '14 at 16:38

source share

eipi10 · Accepted Answer · 2014-11-06T16:32:36+0000

You already get all four combinations of two categorical variables in your regression. You can see it as follows:

Here is the conclusion of your regression:

 Call: glm(formula = cbind(Yes, No) ~ Priorconv + Crime + Priorconv:Crime, family = binomial, data = table1) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.9062 0.3231 5.899 3.66e-09 *** PriorconvP -1.3582 0.3835 -3.542 0.000398 *** CrimeShoplifting 0.9842 0.6069 1.622 0.104863 PriorconvP:CrimeShoplifting -0.5513 0.7249 -0.761 0.446942

So, for Priorconv reference category (one that has a dummy value = 0) is N And for Crime reference category is Other . So, here, how to interpret the regression results for each of the four possibilities (where log (p / (1-p)) is the log of the coefficient of the Yes result):

 1. PriorConv = N and Crime = Other. This is just the case where both dummies are zero, so your regression is just the intercept: log(p/(1-p)) = 1.90 2. PriorConv = P and Crime = Other. So the Priorconv dummy equals 1 and the Crime dummy is still zero: log(p/(1-p)) = 1.90 - 1.36 3. PriorConv = N and Crime = Shoplifting. So the Priorconv dummy is 0 and the Crime dummy is now 1: log(p/(1-p)) = 1.90 + 0.98 4. PriorConv = P and Crime = Shoplifting. Now both dummies are 1: log(p/(1-p)) = 1.90 - 1.36 + 0.98 - 0.55

You can change the order of the values of the coefficients of the two predictor variables, but it will just change which combinations of variables fall into each of the four cases above.

Update: Regarding the issue of regression coefficients regarding ordering factors. A change in the reference level will lead to a change in the coefficients, since the coefficients will be contrasts between different combinations of categories, but will not change the predicted probabilities of the Yes or No result. (Regression modeling would not be so reliable if you could change forecasts just by changing the reference category.) Note, for example, that the predicted probabilities are the same, even if we switch the reference category to Priorconv :

 m1 = glm(cbind(Yes,No)~Priorconv+Crime+Priorconv:Crime,data=table1,family=binomial) predict(m1, type="response") 1 2 3 4 5 6 7 8 0.9473684 0.8705882 0.9473684 0.8705882 0.7272727 0.6336634 0.7272727 0.6336634 table2 = table1 table2$Priorconv = relevel(table2$Priorconv, ref = "P") m2 = glm(cbind(Yes,No)~Priorconv+Crime+Priorconv:Crime,data=table2,family=binomial) predict(m2, type="response") 1 2 3 4 5 6 7 8 0.9473684 0.8705882 0.9473684 0.8705882 0.7272727 0.6336634 0.7272727 0.6336634

Dummy variables for logistic regression in R

More articles: