I wrote code to calculate the correlation between the two Pandas series. Can you tell me what is wrong with my code?

Below is the code:

import numpy as np
import pandas as pd

def correlation(x, y):
    std_x = (x - x.mean())/x.std(ddof = 0)
    std_y = (y - y.mean())/y.std(ddof = 0)
    return (std_x * std_y).mean

a = pd.Series([2, 4, 5, 7, 9])
b = pd.Series([12, 10, 9, 7, 3])
ca = correlation(a, b)
print(ca)

It does not return a correlation value, instead it returns a series with keys like 0 ,1, 2, 3, 4, 5values ​​like -1.747504, -0.340844, -0.043282, -0.259691, -2.531987.

Please help me understand the problem behind this.

+4
source share
3 answers

You need to call mean()using:

return (std_x * std_y).mean()

Not only:

return (std_x * std_y).mean:

which returns the method itself. Full code:

import numpy as np
import pandas as pd

def correlation(x, y):
    std_x = (x - x.mean())/x.std(ddof = 0)
    std_y = (y - y.mean())/y.std(ddof = 0)
    return (std_x * std_y).mean()

a = pd.Series([2, 4, 5, 7, 9])
b = pd.Series([12, 10, 9, 7, 3])
ca = correlation(a, b)
print(ca)

Conclusion:

-0.984661667628
+5
source

You can also use scipy.stats.statsto calculate the Pearson correlation . At a minimum, you can use this as a quick check of the correctness of your algorithm.

from scipy.stats.stats import pearsonr   
import pandas as pd

a = pd.Series([2, 4, 5, 7, 9])
b = pd.Series([12, 10, 9, 7, 3])

pearsonr(a, b)[0]  # -0.98466166762781315
+1

, , pandas , corr, :

a = pd.Series([2, 4, 5, 7, 9])
b = pd.Series([12, 10, 9, 7, 3])

a.corr(b)

-0.98466166762781315

You can also apply corrto dataframe, which calculates all pair correlations between your columns (since each column is perfectly correlated with itself you see 1sdiagonally):

pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 8]}).corr()

          a         b
a  1.000000  0.960769
b  0.960769  1.000000
+1
source

Source: https://habr.com/ru/post/1693086/


All Articles