The standard deviation of the general list?

I need to calculate the standard deviation of the general list. I will try to include my code. Its a general list with the data in it. The data is mainly float and ints. Here is my code that regarding this, without going into details:

namespace ValveTesterInterface { public class ValveDataResults { private List<ValveData> m_ValveResults; public ValveDataResults() { if (m_ValveResults == null) { m_ValveResults = new List<ValveData>(); } } public void AddValveData(ValveData valve) { m_ValveResults.Add(valve); } 

Here is the function in which it is necessary to calculate the standard deviation:

  public float LatchStdev() { float sumOfSqrs = 0; float meanValue = 0; foreach (ValveData value in m_ValveResults) { meanValue += value.LatchTime; } meanValue = (meanValue / m_ValveResults.Count) * 0.02f; for (int i = 0; i <= m_ValveResults.Count; i++) { sumOfSqrs += Math.Pow((m_ValveResults - meanValue), 2); } return Math.Sqrt(sumOfSqrs /(m_ValveResults.Count - 1)); } } } 

Ignore that inside the LatchStdev () function, because I'm sure this is wrong. Just my unsuccessful attempt to calculate st dev. I know how to do this from a list of paired, but not a list of a general list of data. If anyone has experience with this, please help.

+44
math c # statistics standard-deviation
Jun 29 '10 at 14:35
source share
4 answers

This article should help you. It creates a function that calculates the deviation of a sequence of double values. All you have to do is provide a sequence of relevant data elements.

Resulting Function:

 private double CalculateStdDev(IEnumerable<double> values) { double ret = 0; if (values.Count() > 0) { //Compute the Average double avg = values.Average(); //Perform the Sum of (value-avg)_2_2 double sum = values.Sum(d => Math.Pow(d - avg, 2)); //Put it all together ret = Math.Sqrt((sum) / (values.Count()-1)); } return ret; } 

This is easy enough to adapt for any general type if we provide a selector for the calculated value. LINQ is great for this, Select funciton allows you to project a sequence of numerical values ​​from your general list of user types for which the standard deviation can be calculated:

 List<ValveData> list = ... var result = list.Select( v => (double)v.SomeField ) .CalculateStdDev(); 
+48
Jun 29 '10 at
source share
β€” -

The above example is slightly incorrect and may be divided by zero error if your population set is 1. The following code is somewhat simpler and gives the result of "standard population deviation". ( http://en.wikipedia.org/wiki/Standard_deviation )

 using System; using System.Linq; using System.Collections.Generic; public static class Extend { public static double StandardDeviation(this IEnumerable<double> values) { double avg = values.Average(); return Math.Sqrt(values.Average(v=>Math.Pow(v-avg,2))); } } 
+122
Jun 06 2018-11-12T00:
source share

Although the accepted answer seems mathematically correct, it is incorrect from the point of view of programming - it lists the same sequence 4 times. This may be normal if the base object is a list or an array, but if the input is a filtered / aggregated expression / etc linq, or if the data comes directly from the database or network stream, this will lead to significantly lower performance.

I would highly recommend not reinventing the wheel and using one of the best open source math.NET math libraries. We used this library in our company and are very pleased with the performance.

PM> Install-Package MathNet.Numerics

 var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation(); var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation(); 

See http://numerics.mathdotnet.com/docs/DescriptiveStatistics.html for more details.

Finally, for those who want to get the fastest result and sacrifice some accuracy, read the one-pass algorithm https://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods

+16
Apr 13 2018-12-12T00:
source share

I see what you are doing and I am using something similar. It seems to me that you are not far enough away. I tend to encapsulate all data processing in one class, so I can cache the values ​​that are calculated until the list changes. eg:

 public class StatProcessor{ private list<double> _data; //this holds the current data private _avg; //we cache average here private _avgValid; //a flag to say weather we need to calculate the average or not private _calcAvg(); //calculate the average of the list and cache in _avg, and set _avgValid public double average{ get{ if(!_avgValid) //if we dont HAVE to calculate the average, skip it _calcAvg(); //if we do, go ahead, cache it, then set the flag. return _avg; //now _avg is garunteed to be good, so return it. } } ...more stuff Add(){ //add stuff to the list here, and reset the flag } } 

You will notice that with this method, only the first query for the average calculates the average. After that, until we add (or delete or modify at all, and those shown by arnt) something from the list, we can get the average value of basically nothing.

In addition, since the mean value is used in the standard deviation algorithm, calculating the standard deviation will first give us the mean for free, and calculating the mean first will give us a slight increase in performance in calculating the standard deviation, assuming we did not forget to check the flag.

Besides! places like the middle function, where you loop any value anyway, is a great time to cache things like min and max values. Of course, requests for this information should first check to see if they were cached, and which can lead to a relative slowdown compared to just detecting max using a list, since it does all the extra work of setting up all the relevant caches, not just your access .

+1
Oct 29 '12 at 17:08
source share



All Articles