Weighted Standard Deviation in NumPy

numpy.average() has a weight parameter, but numpy.std() does not. Anyone have suggestions on a workaround?

+58
python numpy standard-deviation statsmodels
Mar 09 '10 at 23:53
source share
5 answers

What about the next short โ€œmanual calculationโ€?

 def weighted_avg_and_std(values, weights): """ Return the weighted average and standard deviation. values, weights -- Numpy ndarrays with the same shape. """ average = numpy.average(values, weights=weights) # Fast and numerically precise: variance = numpy.average((values-average)**2, weights=weights) return (average, math.sqrt(variance)) 
+93
Mar 10 '10 at 8:07
source share

statsmodels has a class that makes calculating weighted statistics easier: statsmodels.stats.weightstats.DescrStatsW .

Assuming this dataset and weight:

 import numpy as np from statsmodels.stats.weightstats import DescrStatsW array = np.array([1,2,1,2,1,2,1,3]) weights = np.ones_like(array) weights[3] = 100 

You initialize the class (note that you must pass the correction factor, delta degrees of freedom at this point):

 weighted_stats = DescrStatsW(array, weights=weights, ddof=0) 

Then you can calculate:

  • .mean weighted average :

     >>> weighted_stats.mean 1.97196261682243 
  • .std weighted standard deviation :

     >>> weighted_stats.std 0.21434289609681711 
  • .var weighted variance :

     >>> weighted_stats.var 0.045942877107170932 
  • .std_mean standard weighted average error :

     >>> weighted_stats.std_mean 0.020818822467555047 

    Just in case, if you are interested in the relationship between standard error and standard deviation: standard error (for ddof == 0 ) is calculated as the weighted standard deviation divided by the square root of the sum of the weights minus 1 (the corresponding source for statsmodels version 0.9 on GitHub ):

     standard_error = standard_deviation / sqrt(sum(weights) - 1) 
+23
Apr 7 '16 at 0:57
source share

There is no such function in numpy / scipy yet, but there is a ticket offering this added functionality. Here you will find Statistics.py that implements weighted standard deviations.

+6
Mar 10 2018-10-10T00:
source share

There is a very good example suggested by gaborous :

 import pandas as pd import numpy as np # X is the dataset, as a Pandas' DataFrame mean = mean = np.ma.average(X, axis=0, weights=weights) # Computing the weighted sample mean (fast, efficient and precise) # Convert to a Pandas' Series (it just aesthetic and more # ergonomic; no difference in computed values) mean = pd.Series(mean, index=list(X.keys())) xm = X-mean # xm = X diff to mean xm = xm.fillna(0) # fill NaN with 0 (because anyway a variance of 0 is just void, but at least it keeps the other covariance values computed correctly)) sigma2 = 1./(w.sum()-1) * xm.mul(w, axis=0).T.dot(xm); # Compute the unbiased weighted sample covariance 

Correct the equation for a weighted unbiased covariance sample, URL (version: 2016-06-28)

+1
Nov 09 '17 at 16:20
source share

Here is another option:

 np.sqrt(np.cov(values, aweights=weights)) 
+1
04 Oct '18 at 21:15
source share



All Articles