Observation in a two-dimensional ellipse

Question

Observation in a two-dimensional ellipse

enter image description here Am I trying to find the probability that a point is inside an ellipse? For example, if I planned two-dimensional data (x, y) for 300 datasets in 95% of the ellipsoid region, how can I calculate how many times out of 300 my points fall inside the ellipse?

Here is the code I'm using

library(MASS) seed<-1234 x<-NULL k<-1 Sigma2 <- matrix(c(.72,.57,.57,.46),2,2) Sigma2 rho <- Sigma2[1,2]/sqrt(Sigma2[1,1]*Sigma2[2,2]) rho eta1<-replicate(300,mvrnorm(k, mu=c(-1.59,-2.44), Sigma2)) library(car) dataEllipse(eta1[1,],eta1[2,], levels=c(0.05, 0.95))

Thank you for your help.

+4

r

user1560215 Jul 12 '13 at 10:56

source share

1 answer

Hong ooi · Accepted Answer · 2013-07-13T09:10:12+0000

I do not understand why people jump on the OP. In the context, this is clearly a programming question: it is about obtaining the empirical frequency of data points within a given ellipse, not theoretical probability. The OP even posted code and a graph showing what they were trying to get.

Perhaps they do not fully understand statistical theory behind a 95% ellipse, but they did not ask about it. In addition, creating graphs and calculating frequencies like this is a great way to handle theory.

In any case, here is the code that answers the narrowly defined question of how to count the points inside the ellipse obtained through the normal distribution (which underlies dataEllipse ). The idea is to convert your data into a unit circle through the main components, and then get the points in a certain radius of the origin.

 within.ellipse <- function(x, y, plot.ellipse=TRUE) { if(missing(y) && is.matrix(x) && ncol(x) == 2) { y <- x[,2] x <- x[,1] } if(plot.ellipse) dataEllipse(x, y, levels=0.95) d <- scale(prcomp(cbind(x, y), scale.=TRUE)$x) rad <- sqrt(2 * qf(.95, 2, nrow(d) - 1)) mean(sqrt(d[,1]^2 + d[,2]^2) < rad) }

It was also noted that a data ellipse of 95% contains, by definition, 95% of the data. This, of course, is not so, at least for the ellipses of the normal theory. If your distribution is particularly poor, the coverage frequency may not even converge to the expected level as the sample size increases. Consider a generalized Pareto distribution, for example:

 library(evd) # for rgpd # generalised pareto has no variance for shape > 0.5 z <- sapply(1:1000, function(...) within.ellipse(rgpd(100, shape=5), rgpd(100, shape=5), FALSE)) mean(z) [[1] 0.97451 z <- sapply(1:1000, function(...) within.ellipse(rgpd(10000, shape=5), rgpd(10000, shape=5), FALSE)) mean(z) [1] 0.9995808

Observation in a two-dimensional ellipse

More articles: