Pandas correlation group

Assuming I have a dataframe similar to the one below, how do I get the correlation between two specific columns and then group by the “ID” column? I believe the Pandas 'corr' method finds a correlation between all columns. If possible, I would also like to know how I can find the "groupby" correlation using the .agg function (i.e. Np.correlate).

What I have:

ID Val1 Val2 OtherData OtherData A 5 4 xx A 4 5 xx A 6 6 xx B 4 1 xx B 8 2 xx B 7 9 xx C 4 8 xx C 5 5 xx C 2 1 xx 

What I need:

 ID Correlation_Val1_Val2 A 0.12 B 0.22 C 0.05 

Thanks!

+6
source share
1 answer

You pretty much sorted out all the parts, you just need to combine them:

 In [441]: df.groupby('ID')[['Val1','Val2']].corr() Out[441]: Val1 Val2 ID A Val1 1.000000 0.500000 Val2 0.500000 1.000000 B Val1 1.000000 0.385727 Val2 0.385727 1.000000 

In your case, a 2x2 printout for each identifier is overly detailed. I don't see the ability to print scalar correlation instead of the entire matrix, but you can do something like:

 In [442]:df.groupby('ID')[['Val1','Val2']].corr().ix[0::2,'Val2'] Out[442]: ID A Val1 0.500000 B Val1 0.385727 

And then rename and save things as you like.

+8
source

Source: https://habr.com/ru/post/983656/


All Articles