How can I do standard deviation in Ruby?

I have several records with a given attribute and I want to find the standard deviation.

How to do it?

+48
ruby standard-deviation
Oct 13 2018-11-11T00:
source share
9 answers
module Enumerable def sum self.inject(0){|accum, i| accum + i } end def mean self.sum/self.length.to_f end def sample_variance m = self.mean sum = self.inject(0){|accum, i| accum +(im)**2 } sum/(self.length - 1).to_f end def standard_deviation return Math.sqrt(self.sample_variance) end end 

Testing:

 a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ] a.standard_deviation # => 4.594682917363407 



17/01/2012

sample_variance fix thanks to Dave Sag

+74
Oct. 13 2018-11-11T00:
source share

It looks like Angela may have wanted to have an existing library. After playing with statsample, array-statisics, and several others, I recommend the descriptive_statistics gem if you are trying to avoid reusing the wheel.

 gem install descriptive_statistics 
 $ irb 1.9.2 :001 > require 'descriptive_statistics' => true 1.9.2 :002 > samples = [1, 2, 2.2, 2.3, 4, 5] => [1, 2, 2.2, 2.3, 4, 5] 1.9.2p290 :003 > samples.sum => 16.5 1.9.2 :004 > samples.mean => 2.75 1.9.2 :005 > samples.variance => 1.7924999999999998 1.9.2 :006 > samples.standard_deviation => 1.3388427838995882 

I can’t talk about its statistical correctness or about your comfort with the crossing of monkeys. but it is easy to use and contributes easily.

+31
Sep 06
source share

The answer above is elegant, but has a slight error. Not being the most statistical head, I sat down and read several websites in detail and found that this gives the most understandable explanation of how to get the standard deviation. http://sonia.hubpages.com/hub/stddev

The error in the answer above is in the sample_variance method.

Here is my revised version, as well as a simple unit test that shows that it works.

in ./lib/enumerable/standard_deviation.rb

 #!usr/bin/ruby module Enumerable def sum return self.inject(0){|accum, i| accum + i } end def mean return self.sum / self.length.to_f end def sample_variance m = self.mean sum = self.inject(0){|accum, i| accum + (i - m) ** 2 } return sum / (self.length - 1).to_f end def standard_deviation return Math.sqrt(self.sample_variance) end end 

at ./test using numbers obtained from a simple spreadsheet.

Screen Snapshot of a Numbers spreadsheet with example data

 #!usr/bin/ruby require 'enumerable/standard_deviation' class StandardDeviationTest < Test::Unit::TestCase THE_NUMBERS = [1, 2, 2.2, 2.3, 4, 5] def test_sum expected = 16.5 result = THE_NUMBERS.sum assert result == expected, "expected #{expected} but got #{result}" end def test_mean expected = 2.75 result = THE_NUMBERS.mean assert result == expected, "expected #{expected} but got #{result}" end def test_sample_variance expected = 2.151 result = THE_NUMBERS.sample_variance assert result == expected, "expected #{expected} but got #{result}" end def test_standard_deviation expected = 1.4666287874 result = THE_NUMBERS.standard_deviation assert result.round(10) == expected, "expected #{expected} but got #{result}" end end 
+28
Nov 24 2018-11-11T00:
source share

I'm not a big fan of adding methods to Enumerable , as there may be unwanted side effects. It also provides methods specific to an array of numbers for any class that inherits from Enumerable , which in most cases does not make sense.

While this is good for tests, scripts, or small applications, it is risky for larger applications, so the alternative here is based on @tolitius answer, which was already perfect. This is more for reference than anything else:

 module MyApp::Maths def self.sum(a) a.inject(0){ |accum, i| accum + i } end def self.mean(a) sum(a) / a.length.to_f end def self.sample_variance(a) m = mean(a) sum = a.inject(0){ |accum, i| accum + (i - m) ** 2 } sum / (a.length - 1).to_f end def self.standard_deviation(a) Math.sqrt(sample_variance(a)) end end 

And then you use it as such:

 2.0.0p353 > MyApp::Maths.standard_deviation([1,2,3,4,5]) => 1.5811388300841898 2.0.0p353 :007 > a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ] => [20, 23, 23, 24, 25, 22, 12, 21, 29] 2.0.0p353 :008 > MyApp::Maths.standard_deviation(a) => 4.594682917363407 2.0.0p353 :043 > MyApp::Maths.standard_deviation([1,2,2.2,2.3,4,5]) => 1.466628787389638 

The behavior is the same, but it avoids the overhead and risks of adding methods to Enumerable .

+8
Jan 15 '14 at 16:59
source share

The presented calculations are not very efficient because they require several (at least two, but often three, because you usually want to represent the average value in addition to std-dev) passes through the array.

I know that Ruby is not a place to look for performance, but here is my implementation that calculates the mean and standard deviation with one pass over the list values:

 module Enumerable def avg_stddev return nil unless count > 0 return [ first, 0 ] if count == 1 sx = sx2 = 0 each do |x| sx2 += x**2 sx += x end [ sx.to_f / count, Math.sqrt( # http://wijmo.com/docs/spreadjs/STDEV.html (sx2 - sx**2.0/count) / (count - 1) ) ] end end 
+2
Aug 6 '15 at 9:20
source share

As a simple function, given a list of numbers:

 def standard_deviation(list) mean = list.inject(:+) / list.length.to_f var_sum = list.map{|n| (n-mean)**2}.inject(:+).to_f sample_variance = var_sum / (list.length - 1) Math.sqrt(sample_variance) end 
+1
Jul 29 '16 at 0:44
source share

If your existing records are of type Integer or Rational , you may need to calculate the variance using Rational instead of Float to avoid round-off errors.

For example:

 def variance(list) mean = list.reduce(:+)/list.length.to_r sum_of_squared_differences = list.map { |i| (i - mean)**2 }.reduce(:+) sum_of_squared_differences/list.length end 

(It would be wise to add special case handling for empty lists and other edge cases.)

Then the square root can be defined as:

 def std_dev(list) Math.sqrt(variance(list)) end 
+1
Feb 06 '17 at 19:38
source share

In case people use postgres ... it provides aggregated functions for stddev_pop and stddev_samp - postgresql aggregation functions

stddev (equiv of stddev_samp), available from at least postgr 7.1, samp and pop are available from 8.2.

0
Feb 26 '15 at 21:12
source share

Or how about:

 class Stats def initialize( a ) @avg = a.count > 0 ? a.sum / a.count.to_f : 0.0 @stdev = a.count > 0 ? ( a.reduce(0){ |sum, v| sum + (@avg - v) ** 2 } / a.count ) ** 0.5 : 0.0 end end 
0
Jun 02 '15 at 15:33
source share



All Articles