Best fit line algorithm

I am writing a small application in C # using the MSChart control to make scatter plots of sets of data points X and Y. Some of them can be quite large (hundreds of data points).

One needs to ask if there is a "standard" algorithm for constructing a line of best fit on points. I am going to divide the data points of X into a predetermined number of sets, for example 10 or 20, and for each set we take the average of the corresponding Y values ​​and the average of X, etc., to create a line. Is it correct?

I was looking for existing threads, but they all seem to be aiming for the same using existing applications like Matlab.

Thanks,

+4
source share
2 answers

using the linear least squares algorithm

public class XYPoint { public int X; public double Y; } class Program { public static List<XYPoint> GenerateLinearBestFit(List<XYPoint> points, out double a, out double b) { int numPoints = points.Count; double meanX = points.Average(point => point.X); double meanY = points.Average(point => point.Y); double sumXSquared = points.Sum(point => point.X * point.X); double sumXY = points.Sum(point => point.X * point.Y); a = (sumXY / numPoints - meanX * meanY) / (sumXSquared / numPoints - meanX * meanX); b = (a * meanX - meanY); double a1 = a; double b1 = b; return points.Select(point => new XYPoint() { X = point.X, Y = a1 * point.X - b1 }).ToList(); } static void Main(string[] args) { List<XYPoint> points = new List<XYPoint>() { new XYPoint() {X = 1, Y = 12}, new XYPoint() {X = 2, Y = 16}, new XYPoint() {X = 3, Y = 34}, new XYPoint() {X = 4, Y = 45}, new XYPoint() {X = 5, Y = 47} }; double a, b; List<XYPoint> bestFit = GenerateLinearBestFit(points, out a, out b); Console.WriteLine("y = {0:#.####}x {1:+#.####;-#.####}", a, -b); for(int index = 0; index < points.Count; index++) { Console.WriteLine("X = {0}, Y = {1}, Fit = {2:#.###}", points[index].X, points[index].Y, bestFit[index].Y); } } } 
+8
source

Yes. You will want to use Linear Regression , in particular Simple Linear Regression .

Essential Algorithm:

  • suppose there is a line of best fit, y = ax + b
  • for each of your points you want to minimize their distance from this line.
  • calculate the distance for each point from the line and summarize the distances (we usually use the square of the distance to fine penalize the points further from the line)
  • find the values ​​of a and b that minimize the resulting equation using the basic calculus (there should be only one minimum)

The wikipedia page will provide you with everything you need.

0
source

Source: https://habr.com/ru/post/1440434/


All Articles