Normality tests do not do what most think they do. The Shapiro test, Anderson Darling and others are hypotheses of the null hypothesis AGAINST assumptions about normality. They should not be used to determine whether ordinary statistical theory procedures should be used. In fact, they have little value for a data analyst. Under what conditions are we interested in rejecting the null hypothesis that data is usually distributed? I have never encountered a situation where a normal test is the right thing. When the sample size is small, even large deviations from normality are not detected, and when the sample size is large, even the smallest deviation from normality will lead to a deviated zero.
For example:
> set.seed(100) > x <- rbinom(15,5,.6) > shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.8816, p-value = 0.0502 > x <- rlnorm(20,0,.4) > shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.9405, p-value = 0.2453
So, in both cases (binomial and logarithmic variations) the p-value is> 0.05, which leads to the rejection of null (that the data is normal). Does this mean that we must conclude that the data is normal? (hint: the answer is no). Failure to reject is not the same as accepting. This is a test of hypothesis 101.
But what about large sample sizes? Take the case where the distribution is very almost normal.
> library(nortest) > x <- rt(500000,200) > ad.test(x) Anderson-Darling normality test data: x A = 1.1003, p-value = 0.006975 > qqnorm(x)


Here we use a t-distribution with 200 degrees of freedom. The Qq graph shows that the distribution is closer to normal than any distribution that you are likely to see in the real world, but the test rejects normality with a very high degree of confidence.
Does a significant test against normality mean that in this case we should not use statistics from a normal theory? (another hint: no answer :))
Ian Fellows Oct. 17 2018-11-11T00: 00Z
source share