I ran the correlation matrix on the pandas DataFrame :
df=pd.DataFrame( {'one':[0.1, .32, .2, 0.4, 0.8], 'two':[.23, .18, .56, .61, .12], 'three':[.9, .3, .6, .5, .3], 'four':[.34, .75, .91, .19, .21], 'zive': [0.1, .32, .2, 0.4, 0.8], 'six':[.9, .3, .6, .5, .3], 'drive':[.9, .3, .6, .5, .3]}) corrMatrix=df.corr() corrMatrix drive four one six three two zive drive 1.00 -0.04 -0.75 1.00 1.00 0.24 -0.75 four -0.04 1.00 -0.49 -0.04 -0.04 0.16 -0.49 one -0.75 -0.49 1.00 -0.75 -0.75 -0.35 1.00 six 1.00 -0.04 -0.75 1.00 1.00 0.24 -0.75 three 1.00 -0.04 -0.75 1.00 1.00 0.24 -0.75 two 0.24 0.16 -0.35 0.24 0.24 1.00 -0.35 zive -0.75 -0.49 1.00 -0.75 -0.75 -0.35 1.00
Now I want to write some code to return columns that are perfectly correlated (i.e. correlation == 1) in groups.
Optimally, I would like: [['zive', 'one'], ['three', 'six', 'drive']]
I wrote the code below that gives me ['drive', 'one', 'six', 'three', 'zive'] , but as you can see, this is just a bag with columns that have some kind of perfect correlation with some other column - this does not put them in a distinctive grouping with their fully correlated cousin columns.
correlatedCols=[] for col in corrMatrix: data=corrMatrix[col][corrMatrix[col]==1] if len(data)>1: correlatedCols.append(data.name) correlatedCols ['drive','one', 'six', 'three', 'zive']
EDIT: Using @Karl D.'s recommendations, I get the following:
cor = df.corr() cor.loc[:,:] = np.tril(cor.values, k=-1) cor = cor.stack() cor[cor ==1] six drive 1.00 three drive 1.00 six 1.00 zive one 1.00
.. This is not exactly what I want - since [six, drive] not a grouping - there is no 'three' .