Calculation of confidence interval from sample data

I have an example of data that I would like to calculate the confidence interval for, assuming a normal distribution.

I found and installed numpy and scipy packages and got numpy to return the mean and standard deviation (numpy.mean (data) with the data being a list). Any advice on obtaining a sampling confidence interval would be greatly appreciated.

+43
python numpy statistics
Feb 22 '13 at 21:29
source share
3 answers
import numpy as np import scipy as sp import scipy.stats def mean_confidence_interval(data, confidence=0.95): a = 1.0*np.array(data) n = len(a) m, se = np.mean(a), scipy.stats.sem(a) h = se * sp.stats.t._ppf((1+confidence)/2., n-1) return m, mh, m+h 

you can calculate it like that.

+79
Feb 22 '13 at 10:18
source share

Here's an abridged version of the shasan code that computes a confidence interval of 95% of the average value of array a :

 import numpy as np, scipy.stats as st st.t.interval(0.95, len(a)-1, loc=np.mean(a), scale=st.sem(a)) 

But using StatsModels tconfint_mean is perhaps even better:

 import statsmodels.stats.api as sms sms.DescrStatsW(a).tconfint_mean() 

The initial assumptions for both are that the sample (array a ) was compiled independently of the normal distribution with an unknown standard deviation (see MathWorld or Wikipedia ).

For a large sample size n, the average value of the sample is usually distributed, and its confidence interval can be calculated using st.norm.interval() (as indicated in the comment by Jaime). But the above solutions are also true for small n, where st.norm.interval() gives too narrow confidence intervals (ie, "Fake Confidence"). See My answer to a similar question for more information (and one of Russ's comments here).

Here is an example where the correct parameters give (essentially) identical confidence intervals:

 In [9]: a = range(10,14) In [10]: mean_confidence_interval(a) Out[10]: (11.5, 9.4457397432391215, 13.554260256760879) In [11]: st.t.interval(0.95, len(a)-1, loc=np.mean(a), scale=st.sem(a)) Out[11]: (9.4457397432391215, 13.554260256760879) In [12]: sms.DescrStatsW(a).tconfint_mean() Out[12]: (9.4457397432391197, 13.55426025676088) 

And finally, the wrong result using st.norm.interval() :

 In [13]: st.norm.interval(0.95, loc=np.mean(a), scale=st.sem(a)) Out[13]: (10.23484868811834, 12.76515131188166) 
+39
Dec 26 '15 at 18:56
source share

Start by looking for the z-value for the desired confidence interval with a lookup table . Then the confidence interval is mean +/- z*sigma , where sigma is the estimated standard deviation of your sample average given by sigma = s / sqrt(n) , where s is the standard deviation calculated from your sample data, and n is your sample size.

+6
Feb 22 '13 at 22:15
source share



All Articles