Computing slopes in Numpy (or Scipy)

I am trying to find the fastest and most efficient way to calculate slopes using Numpy and Scipy. I have a dataset of three variables Y and one variable X, and I need to calculate their individual slopes. For example, I can easily do this one line at a time, as shown below, but I was hoping there was a more efficient way to do this. I also don't think linregress is the best way to go, because I don't need any helper variables like interception, standard error, etc. In my results. Any help is appreciated.

import numpy as np from scipy import stats Y = [[ 2.62710000e+11 3.14454000e+11 3.63609000e+11 4.03196000e+11 4.21725000e+11 2.86698000e+11 3.32909000e+11 4.01480000e+11 4.21215000e+11 4.81202000e+11] [ 3.11612352e+03 3.65968334e+03 4.15442691e+03 4.52470938e+03 4.65011423e+03 3.10707392e+03 3.54692896e+03 4.20656404e+03 4.34233412e+03 4.88462501e+03] [ 2.21536396e+01 2.59098311e+01 2.97401268e+01 3.04784552e+01 3.13667639e+01 2.76377113e+01 3.27846013e+01 3.73223417e+01 3.51249997e+01 4.42563658e+01]] X = [ 1990. 1991. 1992. 1993. 1994. 1995. 1996. 1997. 1998. 1999.] slope_0, intercept, r_value, p_value, std_err = stats.linregress(X, Y[0,:]) slope_1, intercept, r_value, p_value, std_err = stats.linregress(X, Y[1,:]) slope_2, intercept, r_value, p_value, std_err = stats.linregress(X, Y[2,:]) slope_0 = slope/Y[0,:][0] slope_1 = slope/Y[1,:][0] slope_2 = slope/Y[2,:][0] b, a = polyfit(X, Y[1,:], 1) slope_1_a = b/Y[1,:][0] 
+13
source share
8 answers

Linear regression calculation in one dimension vector calculation . This means that we can combine multiplications on the entire Y matrix and then vectorize the fit using the axis parameter in numpy. In your case, it will look as follows

 ((X*Y).mean(axis=1) - X.mean()*Y.mean(axis=1)) / ((X**2).mean() - (X.mean())**2) 

You do not need quality parameters, but most of them can be obtained in a similar way.

+6
source

The fastest and most efficient way is to use the built-in scipy function from linregress , which calculates everything:

slope: slope of the regression line

interception: regression line interception

r-value: correlation coefficient

p-value: a two-way p-value for a hypothesis test whose null hypothesis is that the slope is zero

stderr: standard estimation error

And here is an example:

 a = [15, 12, 8, 8, 7, 7, 7, 6, 5, 3] b = [10, 25, 17, 11, 13, 17, 20, 13, 9, 15] from scipy.stats import linregress linregress(a, b) 

will return you:

 LinregressResult(slope=0.20833333333333337, intercept=13.375, rvalue=0.14499815458068521, pvalue=0.68940144811669501, stderr=0.50261704627083648) 

PS Just a mathematical formula for tilting:

enter image description here

+24
source

A view that is simpler than the accepted answer:

 x = np.linspace(0, 10, 11) y = np.linspace(0, 20, 11) y = np.c_[y, y,y] X = x - x.mean() Y = y - y.mean() slope = (X.dot(Y)) / (X.dot(X)) 

The equation for tilt comes from vector notation for tilting a line using simple regression .

+7
source

I did this using the np.diff () function:

dx = np.diff (xvals),

dy = np.diff (yvals)

slopes = dy / dx

+3
source

As said earlier, you can use lean Linregress. Here's how to output only the slope:

  from scipy.stats import linregress x=[1,2,3,4,5] y=[2,3,8,9,22] slope, intercept, r_value, p_value, std_err = linregress(x, y) print(slope) 

Keep in mind that executing this method, as you are calculating additional values ​​such as r_value and p_value, will take longer than calculating only the slope manually. However, Linregress is pretty fast.

Source: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html

+2
source

If X and Y are defined the same way as in your question, you can use:

 dY = (numpy.roll(Y, -1, axis=1) - Y)[:,:-1] dX = (numpy.roll(X, -1, axis=0) - X)[:-1] slopes = dY/dX 

numpy.roll () helps you align the next observation with the current one, you just need to remove the last column, which is not a useful difference between the last and first observation. Then you can calculate all the slopes at once, without scipy.

In your example, dX always 1, so you can save more time by calculating slopes = dY .

+1
source

I relied on other answers and an original regression formula to build a function that works for any tensor. It calculates the slopes of the data along a given axis. So, if you have arbitrary tensors X[i,j,k,l], Y[i,j,k,l] and you want to know the slopes for all other axes along the data on the third axis, you can call this with calcSlopes( X, Y, axis = 2 ) .

 import numpy as np def calcSlopes( x = None, y = None, axis = -1 ): assert x is not None or y is not None # assume that the given single data argument are equally # spaced y-values (like in numpy plot command) if y is None: y = x x = None # move axis we wanna calc the slopes of to first # as is necessary for subtraction of the means # note that the axis 'vanishes' anyways, so we don't need to swap it back y = np.swapaxes( y, axis, 0 ) if x is not None: x = np.swapaxes( x, axis, 0 ) # https://en.wikipedia.org/wiki/Simple_linear_regression # beta = sum_i ( X_i - <X> ) ( Y_i - <Y> ) / ( sum_i ( X_i - <X> )^2 ) if x is None: # axis with values to reduce must be trailing for broadcast_to, # therefore transpose x = np.broadcast_to( np.arange( y.shape[0] ), yTshape ).T x = x - ( x.shape[0] - 1 ) / 2. # mean of (0,1,...,n-1) is n*(n-1)/2/n else: x = x - np.mean( x, axis = 0 ) y = y - np.mean( y, axis = 0 ) # beta = sum_i x_i y_i / sum_i x_i*^2 slopes = np.sum( np.multiply( x, y ), axis = 0 ) / np.sum( x**2, axis = 0 ) return slopes 

He also has a trick for working with data with equally spaced data. For example:

 y = np.array( [ [ 1, 2, 3, 4 ], [ 2, 4, 6, 8 ] ] ) print( calcSlopes( y, axis = 0 ) ) print( calcSlopes( y, axis = 1 ) ) x = np.array( [ [ 0, 2, 4, 6 ], [ 0, 4, 8, 12 ] ] ) print( calcSlopes( x, y, axis = 1 ) ) 

Exit:

 [1. 2. 3. 4.] [1. 2.] [0.5 0.5] 
0
source

This understandable single-line text should be quite effective without boring:

 slope = np.polyfit(X,Y,1)[0] 

Finally you should get

 import numpy as np Y = np.array([ [ 2.62710000e+11, 3.14454000e+11, 3.63609000e+11, 4.03196000e+11, 4.21725000e+11, 2.86698000e+11, 3.32909000e+11, 4.01480000e+11, 4.21215000e+11, 4.81202000e+11], [ 3.11612352e+03, 3.65968334e+03, 4.15442691e+03, 4.52470938e+03, 4.65011423e+03, 3.10707392e+03, 3.54692896e+03, 4.20656404e+03, 4.34233412e+03, 4.88462501e+03], [ 2.21536396e+01, 2.59098311e+01, 2.97401268e+01, 3.04784552e+01, 3.13667639e+01, 2.76377113e+01, 3.27846013e+01, 3.73223417e+01, 3.51249997e+01, 4.42563658e+01]]).T X = [ 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999] print np.polyfit(X,Y,1)[0] 

The output value is [1.54983152e + 10 9.98749876e + 01 1.84564349e + 00]

0
source

Source: https://habr.com/ru/post/1399444/


All Articles