Getting r-squared value using curve_fit

I start with both Python and all of its libraries. But I managed to create a small program that works as intended. He takes a line, counts the presence of different letters and calculates them on the graph, and then applies the equation and its curve. Now I would like to get the r-squared match value.

The general idea is to compare different types of text with articles at different levels and see how strong the overall template is.

I am just an extract, and I'm a beginner, so it’s easy to understand the answer will be amazing.

Code:

import numpy as np import math import matplotlib.pyplot as plt from matplotlib.pylab import figure, show from scipy.optimize import curve_fit s="""det, og deres undersøgelse af hvor meget det bliver brugt viser, at der kun er seks plugins, som benyttes af mere end 5 % af Chrome-brugere. Problemet med teknologien er, at den ivivuilv rduyd iytf ouyf ouy yg oyuf yd iyt erzypu zhrpyh dfgopaehr poargi ah pargoh ertao gehorg aeophgrpaoghraprbpaenbtibaeriber en af hovedårsagerne til sikkerhedshuller, ustabilitet og deciderede nedbrud af browseren. Der vil ikke bve lukket for API'et ivivuilv rduyd iytf ouyf ouy yg oyuf yd iyt erzypu zhrpyh dfgopaehr poargi ah pargoh ertao gehorg aeophgrpaoghraprbpaenbtibaeriber en af hovedårsagerne til sikkerhedshuller, ustabilitet og deciderede nedbrud af browseren. Der vil ikke blive lukket for API'et på én gang, men det vil blive udfaset i løbet af et års tid. De mest populære plugins får lov at fungere i udfasningsperioden; Det drejer sig om: Silverlight (anvendt af 15 % af Chrome-brugere sidste måned), Unity (9,1 %), Google Earth (9,1 %), Java (8,9%), Google Talk (8,7 %) og Facebook Video (6,0 %). Det er muligt at hvidliste andre plugins, men i slutningen af 2014 forventer udviklerne helt at lukke for brugen af dem.""" fordel=[] alf=['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','æ','ø','å'] i=1 p=0 fig = figure() ax1 = fig.add_subplot(1,2,0) for i in range(len(alf)): fordel.append(s.count(alf[i])) i=i+1 fordel=sorted(fordel,key=int,reverse=True) yFit=fordel xFit=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28] def func(x, a, b): return a * (b ** x) popt, pcov = curve_fit(func, xFit, yFit) t = np.arange(0.0, 30.0, 0.1) a=popt[0] b=popt[1] s = (a*b**t) ax1.plot(t,s) print(popt) yMax=math.ceil(fordel[0]+5) ax1.axis([0,30,0,yMax]) for i in range(0,int(len(alf))*2,2): fordel.insert(i,p) p=p+1 for i in range(0,int(len(fordel)/2)): ax1.scatter(fordel[0],fordel[1]) fordel.pop(0) fordel.pop(0) plt.show() show() 
+6
source share
1 answer

Calculation r_squared :

Value r_squared can be found using the average value ( mean ), the total sum of squares ( ss_tot ) and the residual sum of squares ( ss_res ) Each of them is defined as:

mean

SStot

SSres

rsquared

Where f_i - value of the function at the point x_i . Taken from Wikipedia .

From scipy.optimize.curve_fit() :

  • You can get ( popt ) parameters from curve_fit() with

    popt, pcov = curve_fit(f, xdata, ydata)

  • You can get the residual sum of squares ( ss_tot ) via

    • residuals = ydata- f(xdata, popt)
    • ss_res = numpy.sum(residuals**2)
  • You can get the total sum of squares ( ss_tot ) via

    ss_tot = numpy.sum((ydata-numpy.mean(ydata))**2)

  • And finally r_squared -value with,

    r_squared = 1 - (ss_res / ss_tot)

+15
source

Source: https://habr.com/ru/post/955273/


All Articles