See if data is normally distributed in R

Question

See if data is normally distributed in R

Can someone please help me populate the following function in R:

#data is a single vector of decimal values normally.distributed <- function(data) { if(data is normal) return(TRUE) else return(NO) }

+48

r normal-distribution

CodeGuy Oct. 16 2018-11-11T00:

source share

7 answers

I also highly recommend SnowsPenultimateNormalityTest in the TeachingDemos package. function documentation is much more useful to you than the test itself. Read the test carefully before using it.

+21

Brian Diggs Oct. 16 2018-11-11T00:

source share

SnowsPenultimateNormalityTest has its merits, but you can also see qqnorm .

 X <- rlnorm(100) qqnorm(X) qqnorm(rnorm(100))

+12

42-16 oct. 2018-11-11T00:

source share

Consider using the shapiro.test function, which performs the Shapiro-Wilks test for normality. I was pleased with that.

+4

Karl Oct 16 2018-11-11T00:

source share

Library (DNE)

x <-rnorm (1000,0,1)

is.norm (x, 10.0.05)

+2

yuki Nov 16 '14 at 3:23

source share

Anderson-Darling test is also useful.

 library(nortest) ad.test(data)

+1

P Sellaz Oct. 16 '11 at 19:07

source share

when you run a test, you always have the chance to reject the null hypothesis when it is true.

See the following R code:

 p=function(n){ x=rnorm(n,0,1) s=shapiro.test(x) s$p.value } rep1=replicate(1000,p(5)) rep2=replicate(1000,p(100)) plot(density(rep1)) lines(density(rep2),col="blue") abline(v=0.05,lty=3)

The graph shows that if you have a sample size small or large in 5% of cases, when you have a chance to reject the null hypothesis when it is true (type I error)

0

user5807327 Oct 11 '16 at 3:32

source share

Ian Fellows · Accepted Answer · 2011-10-17 00:58

Normality tests do not do what most think they do. The Shapiro test, Anderson Darling and others are hypotheses of the null hypothesis AGAINST assumptions about normality. They should not be used to determine whether ordinary statistical theory procedures should be used. In fact, they have little value for a data analyst. Under what conditions are we interested in rejecting the null hypothesis that data is usually distributed? I have never encountered a situation where a normal test is the right thing. When the sample size is small, even large deviations from normality are not detected, and when the sample size is large, even the smallest deviation from normality will lead to a deviated zero.

For example:

 > set.seed(100) > x <- rbinom(15,5,.6) > shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.8816, p-value = 0.0502 > x <- rlnorm(20,0,.4) > shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.9405, p-value = 0.2453

So, in both cases (binomial and logarithmic variations) the p-value is> 0.05, which leads to the rejection of null (that the data is normal). Does this mean that we must conclude that the data is normal? (hint: the answer is no). Failure to reject is not the same as accepting. This is a test of hypothesis 101.

But what about large sample sizes? Take the case where the distribution is very almost normal.

 > library(nortest) > x <- rt(500000,200) > ad.test(x) Anderson-Darling normality test data: x A = 1.1003, p-value = 0.006975 > qqnorm(x)

enter image description here

Here we use a t-distribution with 200 degrees of freedom. The Qq graph shows that the distribution is closer to normal than any distribution that you are likely to see in the real world, but the test rejects normality with a very high degree of confidence.

Does a significant test against normality mean that in this case we should not use statistics from a normal theory? (another hint: no answer :))

See if data is normally distributed in R

More articles: