Manually build a logistic regression model for forecasting in R

I am trying to test a logistic regression model (for example, 3 coefficients for 3 predictor variables, X1, X2, X3) in a dataset. I know how to test a model after creating a model object using, for example,

mymodel <- glm( Outcome ~ X1 + X2 + X3 , family = binomial,data=trainDat) 

and then check the data

 prob <- predict(mymodel,type="response",newdata=test) 

But I want to now create a logistic model using the coefficients and interceptions that I have, and then test this model on the data.

Basically, I don’t understand how to create “mymodel” without running glm.

Context for the question: I performed a logical regression using offsets, for example:

 mymodel <- glm(Outcome ~ offset(C1 * X1) + offset(C2 * X2) + X3, family = binomial, data = trainDat) 

Thus, the mymodel object generates a model with only intercept coefficients (I) and C3 (for function X3).
Now I need to test the full model (i.e., I + C1 * X1 + C2 * X2 + C3 * X3) in the test dataset, but I don’t know how to get the full model, since the output of mymodel has only interception and C3. So I thought that my more general question is: "How do you manually create an object of a logarithmic regression model?"

Thank you for your patience.

+6
source share
1 answer

I could not find a simple function for this. The predict function has some code that depends on the presence of an installed model (for example, determining the rank of a model). However, we can create a function to create a fake glm object that you can use with prediction. Here is my first attempt at such a function

 makeglm <- function(formula, family, data=NULL, ...) { dots <- list(...) out<-list() tt <- terms(formula, data=data) if(!is.null(data)) { mf <- model.frame(tt, data) vn <- sapply(attr(tt, "variables")[-1], deparse) if((yvar <- attr(tt, "response"))>0) vn <- vn[-yvar] xlvl <- lapply(data[vn], function(x) if (is.factor(x)) levels(x) else if (is.character(x)) levels(as.factor(x)) else NULL) attr(out, "xlevels") <- xlvl[!vapply(xlvl,is.null,NA)] attr(tt, "dataClasses") <- sapply(data[vn], stats:::.MFclass) } out$terms <- tt coef <- numeric(0) stopifnot(length(dots)>1 & !is.null(names(dots))) for(i in seq_along(dots)) { if((n<-names(dots)[i]) != "") { v <- dots[[i]] if(!is.null(names(v))) { coef[paste0(n, names(v))] <- v } else { stopifnot(length(v)==1) coef[n] <- v } } else { coef["(Intercept)"] <- dots[[i]] } } out$coefficients <- coef out$rank <- length(coef) out$qr <- list(pivot=seq_len(out$rank)) out$family <- if (class(family) == "family") { family } else if (class(family) == "function") { family() } else { stop(paste("invalid family class:", class(family))) } out$deviance <- 1 out$null.deviance <- 1 out$aic <- 1 class(out) <- c("glm","lm") out } 

Thus, this function creates an object and passes all the values ​​that predict and print expect to find on such an object. Now we can check it out. Firstly, here are some test data

 set.seed(15) dd <- data.frame( X1=runif(50), X2=factor(sample(letters[1:4], 50, replace=T)), X3=rpois(50, 5), Outcome = sample(0:1, 50, replace=T) ) 

And we can put a standard binomial model with

 mymodel<-glm(Outcome~X1+X2+X3, data=dd, family=binomial) 

What gives

 Call: glm(formula = Outcome ~ X1 + X2 + X3, family = binomial, data = dd) Coefficients: (Intercept) X1 X2b X2c X2d X3 -0.4411 0.8853 1.8384 0.9455 1.5059 -0.1818 Degrees of Freedom: 49 Total (ie Null); 44 Residual Null Deviance: 68.03 Residual Deviance: 62.67 AIC: 74.67 

Now let's say we wanted to try the model that we read in the publication on the same data. Here we use the makeglm function

 newmodel <- makeglm(Outcome~X1+X2+X3, binomial, data=dd, -.5, X1=1, X2=c(b=1.5, c=1, d=1.5), X3=-.15) 

The first parameter is the model formula. This defines the answer and covariates in the same way as with glm . Then you specify the family like you with glm() . And you need to pass a data frame so that R can sniff the correct data types for each of the variables involved. It also identifies all factor variables and their levels using data.frame. Thus, it can be new data that is encoded in the same way as the built-in data.frame, or it can be original.

Now we will begin to indicate the coefficients for use in our model. Odds will be populated using parameter names. An unnamed parameter will be used as an interception. If you have a factor, you need to give coefficients to all levels by named parameters. Here, I simply decided to round the tuned grades to “good” numbers.

Now I can use our newmodel with prediction.

 predict(mymodel, type="response") # 1 2 3 4 5 # 0.4866398 0.3553439 0.6564668 0.7819917 0.3008108 predict(newmodel, newdata=dd, type="response") # 1 2 3 4 5 # 0.5503572 0.4121811 0.7143200 0.7942776 0.3245525 

Here I challenge to predict the original model and the new model using old data with my specified coefficients. We can see that the probability estimate has changed a bit.

Now I have not fully tested this function, so I use it at my own risk. I do not do such error checking, as it seems to me. Maybe someone else knows a better way.

+13
source

Source: https://habr.com/ru/post/970551/


All Articles