Using modelr :: add_predictions for glm

I am trying to calculate the logistic regression prediction for a dataset using tidyverse and modelr packages. Obviously, I am doing something wrong in add_predictions , because I do not get the "answer" of the logistic function, as if I used the "predict" function in statistics. It should be simple, but I cannot figure it out, and multiple searches have yielded little.

 library(tidyverse) library(modelr) options(na.action = na.warn) library(ISLR) d <- as_tibble(ISLR::Default) model <- glm(default ~ balance, data = d, family = binomial) grid <- d %>% data_grid(balance) %>% add_predictions(model) ggplot(d, aes(x=balance)) + geom_point(aes(y = default)) + geom_line(data = grid, aes(y = pred)) 
+6
source share
1 answer

predict.glm type parameter defaults to "link" , which add_predictions does not change by default, and does not provide you with any way to change the almost desired "response" . (There is a GitHub problem , add some good rerefs to it if you want.) However, it’s not difficult to just use predict directly within tidyverse via dplyr::mutate .

Also note that ggplot forces default (coefficient) to a number to build a line, which is good, except that "No" and "Yes" are replaced by 1 and 2, and the probabilities returned by predict will be in the range of 0 up to 1. Explicitly forced numerical value and subtraction of one fixes the graph, although an additional call to scale_y_continuous is required to correct the labels.

 library(tidyverse) library(modelr) d <- as_tibble(ISLR::Default) model <- glm(default ~ balance, data = d, family = binomial) grid <- d %>% data_grid(balance) %>% mutate(pred = predict(model, newdata = ., type = 'response')) ggplot(d, aes(x = balance)) + geom_point(aes(y = as.numeric(default) - 1)) + geom_line(data = grid, aes(y = pred)) + scale_y_continuous('default', breaks = 0:1, labels = levels(d$default)) 

Also note that if all you need is a chart, geom_smooth can calculate forecasts directly for you:

 ggplot(d, aes(balance, as.numeric(default) - 1)) + geom_point() + geom_smooth(method = 'glm', method.args = list(family = 'binomial')) + scale_y_continuous('default', breaks = 0:1, labels = levels(d$default)) 

+3
source

Source: https://habr.com/ru/post/1014950/


All Articles