68% confidence interval for one draw of the normal distribution with mean mu and std sigma
stats.norm.interval(0.68, loc=mu, scale=sigma)
The 68% confidence interval for the average value of N draws from the normal distribution with the average value of mu and std sigma is
stats.norm.interval(0.68, loc=mu, scale=sigma/sqrt(N))
Intuitively, these formulas make sense, because if you hold up a jar of jelly beans and ask a large number of people to guess the number of jelly beans, each person can be turned off a lot - the same deviation is std sigma , but the average value of the guesswork will do a wonderful job of estimating the actual number, and this is reflected by the standard deviation of the average reduction in the coefficient 1/sqrt(N) .
If one draw has a variance of sigma**2 , then according to the Bienaymรฉ formula , the sum of N uncorrelated draws has a variance of N*sigma**2 .
The average value is equal to the sum divided by N. When you multiply a random variable (such as the sum) by a constant, the variance is multiplied by the square of the constant. it
Var(cX) = c**2 * Var(X)
Thus, the variance of the mean is
(variance of the sum)/N**2 = N * sigma**2 / N**2 = sigma**2 / N
and therefore the standard deviation of the mean (which is the square root of the variance) is
sigma/sqrt(N).
This is the beginning of sqrt(N) in the denominator.
Here is an example of code based on Tom code that demonstrates the above statements:
import numpy as np from scipy import stats N = 10000 a = np.random.normal(0, 1, N) mean, sigma = a.mean(), a.std(ddof=1) conf_int_a = stats.norm.interval(0.68, loc=mean, scale=sigma) print('{:0.2%} of the single draws are in conf_int_a' .format(((a >= conf_int_a[0]) & (a < conf_int_a[1])).sum() / float(N))) M = 1000 b = np.random.normal(0, 1, (N, M)).mean(axis=1) conf_int_b = stats.norm.interval(0.68, loc=0, scale=1 / np.sqrt(M)) print('{:0.2%} of the means are in conf_int_b' .format(((b >= conf_int_b[0]) & (b < conf_int_b[1])).sum() / float(N)))
prints
68.03% of the single draws are in conf_int_a 67.78% of the means are in conf_int_b
Remember that if you define conf_int_b with ratings for mean and sigma based on pattern a , the average value may not fall in conf_int_b with the desired frequency.
If you take a sample from the distribution and calculate the average value of the sample and the deviation std,
mean, sigma = a.mean(), a.std()
be careful to note that there is no guarantee that they are equal to the average value and standard deviations, and that we accept the population is usually distributed - this is not automatic money!
If you take a sample and want to estimate the average and standard population deviations, you should use
mean, sigma = a.mean(), a.std(ddof=1)
since this value for sigma is an unbiased estimate for the standard deviation of the population.