Offset does not work in binomial GLM

I am trying to establish logistic regression using glm( family='binomial') .

Here is the model:

 model<-glm(f_ocur~altitud+UTM_X+UTM_Y+j_sin+j_cos+temp_res+pp, offset=(log(1/off)), data=mydata, family='binomial') 

mydata has 76820 observations. The response variable (f_ocur) is 0-1.
This data is an example of a larger data set, so the idea of โ€‹โ€‹setting the bias is to take into account that the data used here are a sample of real data to be analyzed.

For some reason, the offset does not work. When I run this model, I get the result, but when I run the same model, but without bias, I get the same result as the previous model. I expected a different result, but there is no difference.

Am I doing something wrong? Should bias be with linear predictors? eg:

 model <- glm(f_ocur~altitud+UTM_X+UTM_Y+j_sin+j_cos+temp_res+pp+offset(log(1/off)), data=mydata, family='binomial') 

Once the model is ready, I would like to use it with new data. The new data will be the data for checking this model, this data has the same columns. My idea is to use:

 validate <- predict(model, newdata=data2, type='response') 

And so my question is, does the prediction function take into account the offset used to create the model? If not, what should I do to get the correct probabilities for the new data?

+4
source share
2 answers

I think log offset is used with the Poisson family. In case of binomiality you should not use a log. Check out the link https://stats.stackexchange.com/questions/25415/using-offset-in-binomial-model-to-account-for-increased-numbers-of-patients

+4
source

Looking at your question, I assume that your main question is why the bias doesn't matter.

Theft of the proposal from @Ben Bolker Rpub ( https://rpubs.com/bbolker/logregexp ): "A very common situation in the environment (and elsewhere) is survival / binary result when individuals (each measured once) differ in their The classic approach to this problem is to use an additional log-log link.

Therefore, on this basis, I would suggest that the code you are looking for could be:

 model <- glm(f_ocur~altitud+UTM_X+UTM_Y+j_sin+j_cos+temp_res+pp, data=mydata, family = binomial(link = cloglog),offset=log(1/off)) 

Below is a small example, which shows that the results are not only different from each other and without bias, but also using the choice of the AICc model, the better the model is rated higher, despite the fact that the time is โ€œrunningโ€ with the โ€œsiteโ€.

 library(AICcmodavg) set.seed(1) time <- c(rep(1,50),rep(2,50)) site <- c(rep("site 1",50),rep("site 2",50)) surv <- c(rbinom(50,1,prob=0.7),rbinom(50,1,prob=0.7^2)) my.data <- data.frame(surv, site, time) # setup AICc model list Cand.models <- list( ) Cand.models[[1]] <- glm(surv ~ 1, data=my.data, family = binomial(link = cloglog), offset=log(1/time)) Cand.models[[2]] <- glm(surv ~ 1, data=my.data, family = binomial(link = cloglog)) Cand.models[[3]] <- glm(surv ~ site , data=my.data, family = binomial(link = cloglog), offset=log(1/time)) # create a vector of names to trace back models in set Modnames <- paste("mod", 1:length(Cand.models), sep = " ") # generate AICc table aictab(cand.set = Cand.models, modnames = Modnames, sort = TRUE) 
+2
source

Source: https://habr.com/ru/post/1444171/


All Articles