Why does logistic regression still work when # failures are negative?

Question

Why does logistic regression still work when # failures are negative?

I execute binomial glm in R and have some cases where the number of failures is a negative number. (This is due to a measurement error in the data). I would expect the glm function to not work for these cases, since log (# successes / # failure) is undefined. To my surprise, glm works and provides estimates of regression coefficients. I do not understand why glm works and how to interpret the results.

For instance:

succ=c(3,0,1,4,2,4,4,7,15,4);
fail=c(1016,1506,1285,1152,868,610,432,211,129,-4);
x_age=c(42.5,47.5,52.5,57.5,62.5,67.5,72.5,77.5,82.5,87.5);

glm(cbind(succ,fail) ~ x_age, family=binomial);

Call:  glm(formula = cbind(succ, fail) ~ x_age, family = binomial)

Coefficients:
(Intercept)        x_age  
     -14.15         0.14  

Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
Null Deviance:      105 
Residual Deviance: 17.7         AIC: 47.3

+4

r regression

treemake Nov 04 '16 at 2:19

source share

1 answer

Matthew Gunn · Accepted Answer · 2016-11-04T16:40:50+0000

Basically, I don’t think that the authors of the package expected a negative number of failures or successes as input. That doesn't make sense, and you shouldn't do that.

R:

- + .

n <- y[, 1] + y[, 2]

y :

 y <- ifelse(n == 0, 0, y[, 1]/n)

$s_i $- . $f_i $- .

$s_i + f_i\neq 0 $ $y_i =\frac {s_i} {s_i + f_i} $.

$s_i + f_i = 0 $ $y_i = 0 $.

, ! 4 -4 !

mustart:

mustart <- (n * y + 0.5)/(n + 1)

$s_i $- , $f_i $- . , :

$s_i + f_i\neq 0 $, :

$$\mu ^ {start} _i =\frac {s_i +.5} {s_i + f_i + 1} $$

, $s_i + f_i = 0 $, : $$\mu ^ {start} _i =\frac {1} {2} $$

c logit_link , mustart (0,1). , , 4 -2 , + = 0.

if (x < 0 || x > 1)
error(_("Value %g out of range (0, 1)"), x);

, , . , R .

Why does logistic regression still work when # failures are negative?

R:

More articles: