Weighted Standard Deviation in NumPy

Question

Weighted Standard Deviation in NumPy

numpy.average() has a weight parameter, but numpy.std() does not. Anyone have suggestions on a workaround?

+58

python numpy standard-deviation statsmodels

YGA Mar 09 '10 at 23:53

source share

5 answers

statsmodels has a class that makes calculating weighted statistics easier: statsmodels.stats.weightstats.DescrStatsW .

Assuming this dataset and weight:

 import numpy as np from statsmodels.stats.weightstats import DescrStatsW array = np.array([1,2,1,2,1,2,1,3]) weights = np.ones_like(array) weights[3] = 100

You initialize the class (note that you must pass the correction factor, delta degrees of freedom at this point):

 weighted_stats = DescrStatsW(array, weights=weights, ddof=0)

Then you can calculate:

.mean weighted average :

 >>> weighted_stats.mean 1.97196261682243

.std weighted standard deviation :

 >>> weighted_stats.std 0.21434289609681711

.var weighted variance :

 >>> weighted_stats.var 0.045942877107170932

.std_mean standard weighted average error :
```
 >>> weighted_stats.std_mean 0.020818822467555047 
```
Just in case, if you are interested in the relationship between standard error and standard deviation: standard error (for ddof == 0 ) is calculated as the weighted standard deviation divided by the square root of the sum of the weights minus 1 (the corresponding source for statsmodels version 0.9 on GitHub ):
```
 standard_error = standard_deviation / sqrt(sum(weights) - 1) 
```

+23

MSeifert Apr 7 '16 at 0:57

source share

There is no such function in numpy / scipy yet, but there is a ticket offering this added functionality. Here you will find Statistics.py that implements weighted standard deviations.

+6

unutbu Mar 10 2018-10-10T00:

source share

There is a very good example suggested by gaborous :

 import pandas as pd import numpy as np # X is the dataset, as a Pandas' DataFrame mean = mean = np.ma.average(X, axis=0, weights=weights) # Computing the weighted sample mean (fast, efficient and precise) # Convert to a Pandas' Series (it just aesthetic and more # ergonomic; no difference in computed values) mean = pd.Series(mean, index=list(X.keys())) xm = X-mean # xm = X diff to mean xm = xm.fillna(0) # fill NaN with 0 (because anyway a variance of 0 is just void, but at least it keeps the other covariance values computed correctly)) sigma2 = 1./(w.sum()-1) * xm.mul(w, axis=0).T.dot(xm); # Compute the unbiased weighted sample covariance

Correct the equation for a weighted unbiased covariance sample, URL (version: 2016-06-28)

+1

abah Nov 09 '17 at 16:20

source share

Here is another option:

 np.sqrt(np.cov(values, aweights=weights))

+1

Leo 04 Oct '18 at 21:15

source share

Eric Lebigot · Accepted Answer · 2010-03-10 08:07

What about the next short “manual calculation”?

 def weighted_avg_and_std(values, weights): """ Return the weighted average and standard deviation. values, weights -- Numpy ndarrays with the same shape. """ average = numpy.average(values, weights=weights) # Fast and numerically precise: variance = numpy.average((values-average)**2, weights=weights) return (average, math.sqrt(variance))

Weighted Standard Deviation in NumPy

More articles: