What is the difference between MATLAB / Octave corr and Python numpy.correlate?

I am trying to port a MATLAB / Octave program to Python using NumPy 1.8.0 and Python 2.7.3. I used this link as an aid in converting MATLAB functions to NumPy methods with great success until I get to the point where I want to calculate the correlation between the two matrices.

The first matrix is ​​40000x25 floats, the second is 40000x1. In Octave, I use the corr(a,b) operator and get a 25x1 matrix of floats. Attempting the appropriate method in NumPy ( numpy.correlate(a,b) ) throws an error:

 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Python/2.7/site-packages/numpy-1.8.0.dev_1a9aa5a_20130415-py2.7-macosx-10.8-intel.egg/numpy/core/numeric.py", line 751, in correlate return multiarray.correlate2(a,v,mode) ValueError: object too deep for desired array 

I can make it work if I change the code to calculate the correlation for each column a , for example:

 for i in range(25): c2[i] = numpy.correlate(a[:,i], b) 

However, the values ​​in the c2 array are different from the values ​​received from Octave. Octave returns a matrix of 25x1 floats, all less than 1. The values ​​I get from NumPy are floating values ​​from -270 to 900.

I tried to understand what both algorithms do under the hood, but failed miserably. Can anyone point out my logical failure?

+6
source share
1 answer

There seems to be numpy.corrcoef that calculates correlation coefficients if required. However, its interface is different from Octave / Matlab corr .

First of all, by default, the function treats rows as variables, with columns being cases. To mimic the behavior of Octave / Matlab, you can pass a flag that overrides this.

Also, according to this answer , the numpy.cov function (which I assume uses corrcoef ) returns a 2x2 matrix, each of which contains a specific covariance:

 cov(a,a) cov(a,b) cov(a,b) cov(b,b) 

As he points out, the element [0][1] is what you want for cov(a,b) . Thus, perhaps something like this will work:

 for i in range(25): c2[i] = numpy.corrcoef(a[:,i], b, rowvar=0)[0][1] 

For reference, here are some excerpts from the two functions you tried. It seems like they are doing completely different things.

Octave:

- Function file: corr (x, y)

Calculate the matrix of correlation coefficients.

If each row x and y is an observation, and each column is a variable, then the (i, j) th input corr (x, y) is the correlation between the i-th variable in x and the j-th variable in y.

  corr (x,y) = cov (x,y) / (std (x) * std (y)) 

If called with one argument, calculate corr (x, x), then the correlation between the columns is from x.

And Numpy:

numpy.correlate (a, v, mode = 'valid', old_behavior = False) [source]

Cross-correlation of two one-dimensional sequences.

This function calculates the correlation, as is usually defined in the signal processing texts:

 z[k] = sum_n a[n] * conj(v[n+k]) 

with a and v sequences at zero filling, and conj is conjugate.

+6
source

Source: https://habr.com/ru/post/945618/


All Articles