Pandas correlation group

Question

Pandas correlation group

Assuming I have a dataframe similar to the one below, how do I get the correlation between two specific columns and then group by the “ID” column? I believe the Pandas 'corr' method finds a correlation between all columns. If possible, I would also like to know how I can find the "groupby" correlation using the .agg function (i.e. Np.correlate).

What I have:

ID Val1 Val2 OtherData OtherData A 5 4 xx A 4 5 xx A 6 6 xx B 4 1 xx B 8 2 xx B 7 9 xx C 4 8 xx C 5 5 xx C 2 1 xx

What I need:

 ID Correlation_Val1_Val2 A 0.12 B 0.22 C 0.05

Thanks!

+6

python pandas group-by correlation

bsheehy Mar 11 '15 at 2:00 p.m.

source share

1 answer

John · Accepted Answer · 2015-03-11T15:33:03+0000

You pretty much sorted out all the parts, you just need to combine them:

 In [441]: df.groupby('ID')[['Val1','Val2']].corr() Out[441]: Val1 Val2 ID A Val1 1.000000 0.500000 Val2 0.500000 1.000000 B Val1 1.000000 0.385727 Val2 0.385727 1.000000

In your case, a 2x2 printout for each identifier is overly detailed. I don't see the ability to print scalar correlation instead of the entire matrix, but you can do something like:

 In [442]:df.groupby('ID')[['Val1','Val2']].corr().ix[0::2,'Val2'] Out[442]: ID A Val1 0.500000 B Val1 0.385727

And then rename and save things as you like.

Pandas correlation group

More articles: