How can I build a correlation matrix as a set of ellipses, similar to the R open-air package?

The figure below is shown in the figure using the R package in the open:

correlation matrix showing relationships between variables

I know that matplotlib has a function plt.matshow,
but it cannot clearly show the relationship between variables at the same time.

Here is my early work:

df is a pandas 7-variable framework, as shown below:

enter image description here

I do not know how to attach a file .csvto StackOverflow.

Using plt.matshow(df.corr(),cmap = plt.cm.Greens), the picture is as follows:

enter image description here

The second figure cannot represent the correlation relations of variables as clearly as the first.

Edit:

I upload the csv file to google docs here .

+4
3

Python, " ", matplotlib.collections.EllipseCollection:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.collections import EllipseCollection

def plot_corr_ellipses(data, ax=None, **kwargs):

    M = np.array(data)
    if not M.ndim == 2:
        raise ValueError('data must be a 2D array')
    if ax is None:
        fig, ax = plt.subplots(1, 1, subplot_kw={'aspect':'equal'})
        ax.set_xlim(-0.5, M.shape[1] - 0.5)
        ax.set_ylim(-0.5, M.shape[0] - 0.5)

    # xy locations of each ellipse center
    xy = np.indices(M.shape)[::-1].reshape(2, -1).T

    # set the relative sizes of the major/minor axes according to the strength of
    # the positive/negative correlation
    w = np.ones_like(M).ravel()
    h = 1 - np.abs(M).ravel()
    a = 45 * np.sign(M).ravel()

    ec = EllipseCollection(widths=w, heights=h, angles=a, units='x', offsets=xy,
                           transOffset=ax.transData, array=M.ravel(), **kwargs)
    ax.add_collection(ec)

    # if data is a DataFrame, use the row/column names as tick labels
    if isinstance(data, pd.DataFrame):
        ax.set_xticks(np.arange(M.shape[1]))
        ax.set_xticklabels(data.columns, rotation=90)
        ax.set_yticks(np.arange(M.shape[0]))
        ax.set_yticklabels(data.index)

    return ec

, :

data = df.corr()
fig, ax = plt.subplots(1, 1)
m = plot_corr_ellipses(data, ax=ax, cmap='Greens')
cb = fig.colorbar(m)
cb.set_label('Correlation coefficient')
ax.margins(0.1)

enter image description here

:

fig2, ax2 = plt.subplots(1, 1)
data2 = np.linspace(-1, 1, 9).reshape(3, 3)
m2 = plot_corr_ellipses(data2, ax=ax2, cmap='seismic', clim=[-1, 1])
cb2 = fig2.colorbar(m2)
ax2.margins(0.3)

enter image description here

+9

, , seaborn, , clustermap. (, int [-100, 100], :

corr = df.corr().mul(100).astype(int)

     GX   HG   RM   SJ   XB   XN   ZG
GX  100   77   62   71   48   66   57
HG   77  100   69   74   61   61   58
RM   62   69  100   75   48   64   68
SJ   71   74   75  100   50   70   65
XB   48   61   48   50  100   46   51
XN   66   61   64   70   46  100   75
ZG   57   58   68   65   51   75  100

seaborn.clustermap() :

import seaborn as sns
sns.clustermap(data=corr, annot=True, fmt='d', cmap='Greens').savefig('cluster.png')

enter image description here

+1

I just discovered this python biokit package today. It provides a very convenient function for creating various types of correlation diagrams. For example:

In [1]: import pandas as pd

In [2]: import matplotlib.pyplot as plt
   ...: from biokit.viz import corrplot

In [6]: corr
Out[6]: 
      GX    HG    RM    SJ    XB    XN    ZG
GX  1.00 -0.77  0.62  0.71  0.48  0.66  0.57
HG -0.77  1.00  0.69  0.74  0.61  0.61  0.58
RM  0.62  0.69  1.00  0.75  0.48  0.64  0.68
SJ  0.71  0.74  0.75  1.00  0.50  0.70  0.65
XB  0.48  0.61  0.48  0.50  1.00 -0.46  0.51
XN  0.66  0.61  0.64  0.70 -0.46  1.00  0.75
ZG  0.57  0.58  0.68  0.65  0.51  0.75  1.00

I took Stefan's data and changed it a bit. Suppose this is a correlation matrix. Now, to create a correlation chart, you can simply do this:

In [7]: c = corrplot.Corrplot(corr)
   ...: c.plot()

Ellipse correlation diagram

You can read more examples here .

+1
source

Source: https://habr.com/ru/post/1622429/


All Articles