Fine tune stat_ellipse () in ggplot2

Question

Fine tune stat_ellipse () in ggplot2

I want to create a scatter plot of a two-dimensional normal distribution with a 95% "exact" confidence ellipse.

library(mvtnorm) library(ggplot2) set.seed(1) n <- 1e3 c95 <- qchisq(.95, df=2) rho <- 0.8 #correlation Sigma <- matrix(c(1, rho, rho, 1), 2, 2) # Covariance matrix

I made 1000 observations from a two-dimensional normal with mean zero and variance = Sigma

 x <- rmvnorm(n, mean=c(0, 0), Sigma) z <- p95 <- rep(NA, n) for(i in 1:n){ z[i] <- x[i, ] %*% solve(Sigma, x[i, ]) p95[i] <- (z[i] < c95) }

We can easily use the 95% confidence ellipse at the top of the scatterplot of generated data using stat_ellipse . The resulting shape is completely satisfactory until you notice that a few red dots lie inside a confident ellipse. I think that this discrepancy comes from an estimate of some parameters and disappears as the sample size increases.

 data <- data.frame(x, z, p95) p <- ggplot(data, aes(X1, X2)) + geom_point(aes(colour = p95)) p + stat_ellipse(type = "norm")

Fig. 1

Is there a way to fine-tune stat_ellipse() so that it displays an “exact” trust ellipse, as shown in the figure below, which was created using the “hand-made” ellips ? enter image description here

 ellips <- function(center = c(0,0), c=c95, rho=-0.8, npoints = 100){ t <- seq(0, 2*pi, len=npoints) Sigma <- matrix(c(1, rho, rho, 1), 2, 2) a <- sqrt(c*eigen(Sigma)$values[2]) b <- sqrt(c*eigen(Sigma)$values[1]) x <- center[1] + a*cos(t) y <- center[2] + b*sin(t) X <- cbind(x, y) R <- eigen(Sigma)$vectors data.frame(X%*%R) } dat <- ellips(center=c(0, 0), c=c95, rho, npoints=100) p + geom_path(data=dat, aes(x=X1, y=X2), colour='blue')

+5

r ggplot2

Khashaa Dec 9 '14 at 15:07

source share

1 answer

bdemarest · Accepted Answer · 2014-12-10T01:10:55+0000

This is not a real answer, but it can help.

Having studied stat_ellipse with the following commands,

 stat_ellipse ls(ggplot2:::StatEllipse) ggplot2:::StatEllipse$calculate ggplot2:::calculate_ellipse ?cov.wt

it seems that cov.wt estimates the covariance matrix from the simulated data:

 cov.wt(data[, c(1, 2)])$cov # X1 X2 # X1 1.1120267 0.8593946 # X2 0.8593946 1.0372208 # True covariance matrix: Sigma # [,1] [,2] # [1,] 1.0 0.8 # [2,] 0.8 1.0

You can calculate your p95 values using the covariance p95 matrix. Or just stick with your own well-designed ellipse drawing code.

Fine tune stat_ellipse () in ggplot2

More articles: