Run a basic correlation between two columns of a data frame

I am trying to create a correlation matrix from a pandas frame using data from the specified columns

Here are my csv details:

col0,col1,col2,col3,col4
122468.9071,1417464.203,3546600,151804924,10839476
14691.1139,170036.0407,103847,19208604,2365065

Here are two data boxes I created:

df1 = pd.read_csv('c:/temp/test_1.csv', usecols=[0])
df2 = pd.read_csv('c:/temp/test_1.csv', usecols=[1])

I tried the corr and corrwith functions and got the following errors:

Corr Function:

print df1.corr(df2)

Result: 

Error: Could not compare ['pearson'] with block values

Corrwith:

print df1.corrwith(df2)

Result:    

col0   NaN
col1   NaN
dtype: float64

As you can see, there are no null values ​​in the dataset, and float64 should be able to handle decimals.

Any help on the decision would be greatly appreciated.

Tiberius

+4
source share
1 answer

If you are trying to create a correlation matrix between two columns, I would suggest entering them in the same data file, for example:

df = pd.read_csv('c:/temp/test_1.csv', usecols=[0,1])
df.corr()

csv 2x2 1s, .

pandas : http://pandas.pydata.org/pandas-docs/stable/computation.html#correlation

+5

Source: https://habr.com/ru/post/1626599/


All Articles