I wrote code to calculate the correlation between the two Pandas series. Can you tell me what is wrong with my code?

Question

I wrote code to calculate the correlation between the two Pandas series. Can you tell me what is wrong with my code?

Below is the code:

import numpy as np
import pandas as pd

def correlation(x, y):
    std_x = (x - x.mean())/x.std(ddof = 0)
    std_y = (y - y.mean())/y.std(ddof = 0)
    return (std_x * std_y).mean

a = pd.Series([2, 4, 5, 7, 9])
b = pd.Series([12, 10, 9, 7, 3])
ca = correlation(a, b)
print(ca)

It does not return a correlation value, instead it returns a series with keys like 0 ,1, 2, 3, 4, 5values like -1.747504, -0.340844, -0.043282, -0.259691, -2.531987.

Please help me understand the problem behind this.

+4

python python-3.x pandas

python_noob Feb 02 '18 at 16:41

source share

3 answers

You can also use scipy.stats.statsto calculate the Pearson correlation . At a minimum, you can use this as a quick check of the correctness of your algorithm.

from scipy.stats.stats import pearsonr   
import pandas as pd

a = pd.Series([2, 4, 5, 7, 9])
b = pd.Series([12, 10, 9, 7, 3])

pearsonr(a, b)[0]  # -0.98466166762781315

+1

jpp 02 . '18 16:58

, , pandas , corr, :

a = pd.Series([2, 4, 5, 7, 9])
b = pd.Series([12, 10, 9, 7, 3])

a.corr(b)

-0.98466166762781315

You can also apply corrto dataframe, which calculates all pair correlations between your columns (since each column is perfectly correlated with itself you see 1sdiagonally):

pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 8]}).corr()

          a         b
a  1.000000  0.960769
b  0.960769  1.000000

+1

Cleb Feb 02 '18 at 18:53

source share

Mike müller · Accepted Answer · 2018-02-02T16:49:25+0000

You need to call mean()using:

return (std_x * std_y).mean()

Not only:

return (std_x * std_y).mean:

which returns the method itself. Full code:

import numpy as np
import pandas as pd

def correlation(x, y):
    std_x = (x - x.mean())/x.std(ddof = 0)
    std_y = (y - y.mean())/y.std(ddof = 0)
    return (std_x * std_y).mean()

a = pd.Series([2, 4, 5, 7, 9])
b = pd.Series([12, 10, 9, 7, 3])
ca = correlation(a, b)
print(ca)

Conclusion:

-0.984661667628

I wrote code to calculate the correlation between the two Pandas series. Can you tell me what is wrong with my code?

More articles: