Python: defining the linear part of the slope

I have several graphs that look like this:

enter image description here

I am wondering what methods can be found to find the slope between about 5.5 and 8 for the x axis. Where there are several such graphs, I wonder if there is a way to automatically find the slope value.

Any suggestions?

I think ployfit () or linear regression. The problem is that I'm not sure how to automatically find the values.

+12
source share
4 answers

A general way to find linear parts in data sets is to compute the second derivative of a function and see where it is (close to zero). There are several things to the solution path:

  • How to calculate the second derivative of noisy data? One quick and simple method that can be easily adapted to different noise levels, data sizes and expected lengths of a linear patch is to convolution data with a convolution kernel equal to the second derivative of Gaussian. The adjustable part is the width of the core.

  • What does "close to zero" mean in your context? To answer this question, you need to experiment with your data.

  • The results of this method can be used as input for the chi ^ 2 method described above to identify candidate regions in the data set.

Here are some sources to get you started:

from matplotlib import pyplot as plt import numpy as np # create theoretical data x_a = np.linspace(-8,0, 60) y_a = np.sin(x_a) x_b = np.linspace(0,4,30)[1:] y_b = x_b[:] x_c = np.linspace(4,6,15)[1:] y_c = np.sin((x_c - 4)/4*np.pi)/np.pi*4. + 4 x_d = np.linspace(6,14,120)[1:] y_d = np.zeros(len(x_d)) + 4 + (4/np.pi) x = np.concatenate((x_a, x_b, x_c, x_d)) y = np.concatenate((y_a, y_b, y_c, y_d)) # make noisy data from theoretical data y_n = y + np.random.normal(0, 0.27, len(x)) # create convolution kernel for calculating # the smoothed second order derivative smooth_width = 59 x1 = np.linspace(-3,3,smooth_width) norm = np.sum(np.exp(-x1**2)) * (x1[1]-x1[0]) # ad hoc normalization y1 = (4*x1**2 - 2) * np.exp(-x1**2) / smooth_width *8#norm*(x1[1]-x1[0]) # calculate second order deriv. y_conv = np.convolve(y_n, y1, mode="same") # plot data plt.plot(x,y_conv, label = "second deriv") plt.plot(x, y_n,"o", label = "noisy data") plt.plot(x, y, label="theory") plt.plot(x, x, "0.3", label = "linear data") plt.hlines([0],-10, 20) plt.axvspan(0,4, color="y", alpha=0.2) plt.axvspan(6,14, color="y", alpha=0.2) plt.axhspan(-1,1, color="b", alpha=0.2) plt.vlines([0, 4, 6],-10, 10) plt.xlim(-2.5,12) plt.ylim(-2.5,6) plt.legend(loc=0) plt.show() 

This is the result: enter image description here

smooth_width - the width of the convolution kernel. To adjust the noise level, change the value of 0.27 randomly. Normal for different values. Note that this method does not work very well with the border of the data space.

As you can see, the close to zero requirement for the second derivative (blue line) is pretty good for the yellow parts where the data is linear.

+22
source

You can use the Ramer Douglas Peucker algorithm to simplify your data to a smaller set of line segments. The algorithm allows you to specify epsilon , so that each data point is no further than epsilon from some line segment. The slope of the line segments gives an approximate estimate of the slope of the curve.

There is a Python implementation of the RDP algorithm.

+3
source

This is just a possible solution, it will find a line segment of points whose minimum value chi ^ 2 is greater than a given minimum;

 from matplotlib.pyplot import figure, show from numpy import pi, sin, linspace, exp, polyfit from matplotlib.mlab import stineman_interp x = linspace(0,2*pi,20); y = x + sin(x) + exp(-0.5*(x-2)**2); num_points = len(x) min_fit_length = 5 chi = 0 chi_min = 10000 i_best = 0 j_best = 0 for i in range(len(x) - min_fit_length): for j in range(i+min_fit_length, len(x)): coefs = polyfit(x[i:j],y[i:j],1) y_linear = x * coefs[0] + coefs[1] chi = 0 for k in range(i,j): chi += ( y_linear[k] - y[k])**2 if chi < chi_min: i_best = i j_best = j chi_min = chi print chi_min coefs = polyfit(x[i_best:j_best],y[i_best:j_best],1) y_linear = x[i_best:j_best] * coefs[0] + coefs[1] fig = figure() ax = fig.add_subplot(111) ax.plot(x,y,'ro') ax.plot(x[i_best:j_best],y_linear,'b-') show() 

enter image description here

I see this becoming problematic for large datasets though ...

+1
source

If your β€œmodel” of data consists of data that is mostly straightforward, with few outliers or fragments at the end, you can try RANSAC .

Pseudocode (very verbose, sorry) would be as follows:

 choose a small threshold distance D for N iterations: pick two random points from your data, a and b fit a straight line, L, to a and b count the inliers: data points within a distance D of the line L save the parameters of the line with the most inliers so far estimate the final line using ALL the inliers of the best line 
0
source

Source: https://habr.com/ru/post/949231/


All Articles