Divide the marine matrix into lines according to clustering

This article has a good way to visualize dataset clusters with binary functions by constructing a 2D matrix and sorting the values ​​according to the cluster.

clusters

In this case, there are three clusters, as evidenced by black dividing lines; rows are sorted and show which examples are indicated in each cluster, and columns are functions of each example.

Given a cluster assignment vector and pandas DataFrame, how can I replicate this using a Python library (like seaborn)? Building a DataFrame using a marine vessel is not difficult and does not sort the DataFrame rows to match cluster assignments. What interests me most is how to display these black dividing lines that define each cluster.

Dummy data:

"""
       col1  col2
x1_c0     0     1
x2_c0     0     1
================= I want a line drawn here
x3_c1     1     0
================= and here
x4_c2     1     0
"""
import pandas as pd
import seaborn as sns

df = pd.DataFrame(
    data={'col1': [0, 0, 1, 1], 'col2': [1, 1, 0, 0]},
    index=['x1_c0', 'x2_c0', 'x3_c1', 'x4_c2']
)
clus = [0, 0, 1, 2]  # This is the cluster assignment

sns.heatmap(df)

enter image description here

+4
source share
1 answer

The link mwaskom posted in a comment is a good starting place. The trick finds out what are the coordinates for vertical and horizontal lines.

, ,

%matplotlib inline

import pandas as pd
import seaborn as sns

df = pd.DataFrame(data={'col1': [0, 0, 1, 1], 'col2': [1, 1, 0, 0]},
                  index=['x1_c0', 'x2_c0', 'x3_c1', 'x4_c2'])

f, ax = plt.subplots(figsize=(8, 6))

sns.heatmap(df)

ax.axvline(1, 0, 2, linewidth=3, c='w')
ax.axhline(1, 0, 1, linewidth=3, c='w')
ax.axhline(2, 0, 1, linewidth=3, c='w')
ax.axhline(3, 0, 1, linewidth=3, c='w')

f.tight_layout()

enter image description here

, axvline, - x , ( 1, 0, 2). y, x start x stop . , .

. , - . , , :

df = pd.DataFrame(data={'col1': [0, 0, 1, 1, 1.5], 'col2': [1, 1, 0, 0, 2]},
                  index=['x1_c0', 'x2_c0', 'x3_c1', 'x4_c2', 'x5_c2'])

df['id_'] = df.index
df['group'] = [1, 2, 2, 3, 3]
df.set_index(['group', 'id_'], inplace=True)
df

             col1  col2
group id_
1     x1_c0   0.0     1
2     x2_c0   0.0     1
      x3_c1   1.0     0
3     x4_c2   1.0     0
      x5_c2   1.5     2

:

f, ax = plt.subplots(figsize=(8, 6))

sns.heatmap(df)

groups = df.index.get_level_values(0)

for i, group in enumerate(groups):
    if i and group != groups[i - 1]:
        ax.axhline(len(groups) - i, c="w", linewidth=3)

ax.axvline(1, c="w", linewidth=3)

f.tight_layout()

enter image description here

,

+1

Source: https://habr.com/ru/post/1619788/


All Articles