Run a basic correlation between two columns of a data frame

Question

Run a basic correlation between two columns of a data frame

I am trying to create a correlation matrix from a pandas frame using data from the specified columns

Here are my csv details:

col0,col1,col2,col3,col4
122468.9071,1417464.203,3546600,151804924,10839476
14691.1139,170036.0407,103847,19208604,2365065

Here are two data boxes I created:

df1 = pd.read_csv('c:/temp/test_1.csv', usecols=[0])
df2 = pd.read_csv('c:/temp/test_1.csv', usecols=[1])

I tried the corr and corrwith functions and got the following errors:

Corr Function:

print df1.corr(df2)

Result: 

Error: Could not compare ['pearson'] with block values

Corrwith:

print df1.corrwith(df2)

Result:    

col0   NaN
col1   NaN
dtype: float64

As you can see, there are no null values in the dataset, and float64 should be able to handle decimals.

Any help on the decision would be greatly appreciated.

Tiberius

+4

python python-2.7 pandas

Tiberius Jan 29 '16 at 22:32

source share

1 answer

Josh Baker · Accepted Answer · 2016-01-30T00:08:57+0000

If you are trying to create a correlation matrix between two columns, I would suggest entering them in the same data file, for example:

df = pd.read_csv('c:/temp/test_1.csv', usecols=[0,1])
df.corr()

csv 2x2 1s, .

pandas : http://pandas.pydata.org/pandas-docs/stable/computation.html#correlation

Run a basic correlation between two columns of a data frame

More articles: