Qqnorm and qqline in ggplot2

Question

Qqnorm and qqline in ggplot2

Let's say there is a linear LM model that I want to build qq residues. Normally I would use basic R graphics:

qqnorm(residuals(LM), ylab="Residuals") qqline(residuals(LM))

I can figure out how to get the qqnorm part of the graph, but I cannot describe qqline:

 ggplot(LM, aes(sample=.resid)) + stat_qq()

I suspect I'm missing something quite simple, but it seems like this should be an easy way to do this.

EDIT: Thanks so much for the solution below. I changed the code (very little) to extract information from the linear model so that the graph works as a convenience graph in the base graphics package R.

 ggQQ <- function(LM) # argument: a linear model { y <- quantile(LM$resid[!is.na(LM$resid)], c(0.25, 0.75)) x <- qnorm(c(0.25, 0.75)) slope <- diff(y)/diff(x) int <- y[1L] - slope * x[1L] p <- ggplot(LM, aes(sample=.resid)) + stat_qq(alpha = 0.5) + geom_abline(slope = slope, intercept = int, color="blue") return(p) }

+46

r ggplot2

Peter Dec 05 2018-10-12T00:

source share

5 answers

You can also add confidence intervals / confidence ranges with this function (pieces of code copied from car:::qqPlot )

 gg_qq <- function(x, distribution = "norm", ..., line.estimate = NULL, conf = 0.95, labels = names(x)){ q.function <- eval(parse(text = paste0("q", distribution))) d.function <- eval(parse(text = paste0("d", distribution))) x <- na.omit(x) ord <- order(x) n <- length(x) P <- ppoints(length(x)) df <- data.frame(ord.x = x[ord], z = q.function(P, ...)) if(is.null(line.estimate)){ Qx <- quantile(df$ord.x, c(0.25, 0.75)) Qz <- q.function(c(0.25, 0.75), ...) b <- diff(Qx)/diff(Qz) coef <- c(Qx[1] - b * Qz[1], b) } else { coef <- coef(line.estimate(ord.x ~ z)) } zz <- qnorm(1 - (1 - conf)/2) SE <- (coef[2]/d.function(df$z)) * sqrt(P * (1 - P)/n) fit.value <- coef[1] + coef[2] * df$z df$upper <- fit.value + zz * SE df$lower <- fit.value - zz * SE if(!is.null(labels)){ df$label <- ifelse(df$ord.x > df$upper | df$ord.x < df$lower, labels[ord],"") } p <- ggplot(df, aes(x=z, y=ord.x)) + geom_point() + geom_abline(intercept = coef[1], slope = coef[2]) + geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.2) if(!is.null(labels)) p <- p + geom_text( aes(label = label)) print(p) coef }

Example:

 Animals2 <- data(Animals2, package = "robustbase") mod.lm <- lm(log(Animals2$brain) ~ log(Animals2$body)) x <- rstudent(mod.lm) gg_qq(x)

enter image description here

+19

Rentrop Nov 28 '14 at

source share

Standard QQ diagnostics for linear models calculates the quantile of standardized residuals compared to the theoretical quantiles N (0,1). The @Peter ggQQ function displays leftovers. The screenshot below modifies this and adds a few cosmetic changes to make the plot look more like what you can get from plot(lm(...)) .

 ggQQ = function(lm) { # extract standardized residuals from the fit d <- data.frame(std.resid = rstandard(lm)) # calculate 1Q/4Q line y <- quantile(d$std.resid[!is.na(d$std.resid)], c(0.25, 0.75)) x <- qnorm(c(0.25, 0.75)) slope <- diff(y)/diff(x) int <- y[1L] - slope * x[1L] p <- ggplot(data=d, aes(sample=std.resid)) + stat_qq(shape=1, size=3) + # open circles labs(title="Normal QQ", # plot title x="Theoretical Quantiles", # x-axis label y="Standardized Residuals") + # y-axis label geom_abline(slope = slope, intercept = int, linetype="dashed") # dashed reference line return(p) }

Usage example:

 # sample data (y = x + N(0,1), x in [1,100]) df <- data.frame(cbind(x=c(1:100),y=c(1:100+rnorm(100)))) ggQQ(lm(y~x,data=df))

+10

jlhoward Nov 14 '13 at 22:51

source share

Starting with version 2.0, ggplot2 has a well-documented interface for extension; so we can now easily write a new stat for qqline on its own (which I did the first time, so the welcome improvements):

 qq.line <- function(data, qf, na.rm) { # from stackoverflow.com/a/4357932/1346276 q.sample <- quantile(data, c(0.25, 0.75), na.rm = na.rm) q.theory <- qf(c(0.25, 0.75)) slope <- diff(q.sample) / diff(q.theory) intercept <- q.sample[1] - slope * q.theory[1] list(slope = slope, intercept = intercept) } StatQQLine <- ggproto("StatQQLine", Stat, # http://docs.ggplot2.org/current/vignettes/extending-ggplot2.html # https://github.com/hadley/ggplot2/blob/master/R/stat-qq.r required_aes = c('sample'), compute_group = function(data, scales, distribution = stats::qnorm, dparams = list(), na.rm = FALSE) { qf <- function(p) do.call(distribution, c(list(p = p), dparams)) n <- length(data$sample) theoretical <- qf(stats::ppoints(n)) qq <- qq.line(data$sample, qf = qf, na.rm = na.rm) line <- qq$intercept + theoretical * qq$slope data.frame(x = theoretical, y = line) } ) stat_qqline <- function(mapping = NULL, data = NULL, geom = "line", position = "identity", ..., distribution = stats::qnorm, dparams = list(), na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) { layer(stat = StatQQLine, data = data, mapping = mapping, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(distribution = distribution, dparams = dparams, na.rm = na.rm, ...)) }

It also generalizes the distribution (just like stat_qq ) and can be used as follows:

 > test.data <- data.frame(sample=rnorm(100, 10, 2)) # normal distribution > test.data.2 <- data.frame(sample=rt(100, df=2)) # t distribution > ggplot(test.data, aes(sample=sample)) + stat_qq() + stat_qqline() > ggplot(test.data.2, aes(sample=sample)) + stat_qq(distribution=qt, dparams=list(df=2)) + + stat_qqline(distribution=qt, dparams=list(df=2))

(Unfortunately, since qqline is on a separate level, I could not find a way to “reuse” distribution parameters, but this should only be a minor issue.)

+10

phg Feb 23 '16 at

source share

Why not the following?

For some vector, let's say

 myresiduals <- rnorm(100) ^ 2 ggplot(data=as.data.frame(qqnorm( myresiduals , plot=F)), mapping=aes(x=x, y=y)) + geom_point() + geom_smooth(method="lm", se=FALSE)

But it seems strange to us that we should use the traditional graphical function to support ggplot2.

Can we not get the same effect in some way, starting with the vector for which we want to build a quantile, and then apply the corresponding functions "stat" and "geom" in ggplot2?

Does Hadley Wickham observe these messages? Maybe he can show us the best way.

+9

Jacob Wegelin Feb 08 2018-11-28T00:

source share

Aaron · Accepted Answer · 2010-12-05 08:08

The following code will give you the plot you want. The ggplot package does not seem to contain code to calculate qqline parameters, so I don’t know if such a graph can be achieved in a (understandable) single-line space.

 qqplot.data <- function (vec) # argument: vector of numbers { # following four lines from base R qqline() y <- quantile(vec[!is.na(vec)], c(0.25, 0.75)) x <- qnorm(c(0.25, 0.75)) slope <- diff(y)/diff(x) int <- y[1L] - slope * x[1L] d <- data.frame(resids = vec) ggplot(d, aes(sample = resids)) + stat_qq() + geom_abline(slope = slope, intercept = int) }

Qqnorm and qqline in ggplot2

More articles: