How to visualize (dendrogram) a dictionary of hierarchical elements?

This is my first experience visualizing from hierarchical data in dictionary format using Python. The last piece of data is as follows:

d = {^2820: [^391, ^1024], ^2821: [^759, 'w', ^118, ^51], ^2822: [^291, 'o'], ^2823: [^25, ^64], ^2824: [^177, ^2459], ^2825: [^338, ^1946], ^2826: [^186, ^1511], ^2827: [^162, 'i']} 

So, I have indexes in lists referring to keys (index) of the dictionary. I suppose this could be used as a basic structure for visualization, please correct me if I am wrong. Data symbols are β€œleaf nodes / leaves” that do not refer to any index.

I found NetworkX, which, perhaps, could be used for visualization, but I have no idea where to start with it and my data. I was hoping it would be so simple:

 import networkx as nx import matplotlib.pyplot as plt d = {^2820: [^391, ^1024], ^2821: [^759, 'w', ^118, ^51], ^2822: [^291, 'o'], ^2823: [^25, ^64], ^2824: [^177, ^2459], ^2825: [^338, ^1946], ^2826: [^186, ^1511], ^2827: [^162, 'i']} G = nx.Graph(d) nx.draw(G) plt.show() 

I am looking for a hierarchical dendrogram or some kind of clustering as output. Sorry, at the moment I'm not quite sure what would be the best visualization, maybe it looks like this:

Dendrogram

UPDATE

Using NetworkX was actually very simple. I provide other simple data examples and look for an answer if it can be visualized with a dendrogram instead of a wired network diagram as well?

 # original sequence: a,b,c,d,b,c,a,b,c,d,b,c d = {^1: ['b', 'c'], ^2: ['a', ^1, 'd', ^1], 'S': [^2, ^2]} G = nx.Graph(d) nx.draw_spring(G, node_size=300, with_labels=True) 

NetworkX draw_spring nx.Graph

As we can see, the graph shows a simple relationship, but not the hierarchy and order of the data that I am ready to do. DiGraph gives more detailed information, but it is still impossible to build the original sequence from it:

NetworkX draw_spring nx.DiGraph

For the dendrogram, apparently, the weight and end nodes should be calculated as indicated in the first answer. For this approach, the data structure may be something like this:

 d = {'a': [], 'b': [], 'c': [], 'd': [], ^1: ['b', 'c'], ^2: ['a', ^1, 'd', ^1], 'S': [^2, ^2]} 
+5
source share
1 answer

One idea is to use the SciPy dendrogram function to draw your dendrogram. To do this, you just need to create the Z link matrix, which is described in the SciPy linkage function documentation. Each row [x, y, w, z] the linkage matrix Z describes the weight w at which x and y combine to form a root subtree with Z leaves.

To demonstrate, I created a simple example using the small three-leaf dendrogram shown below: example dendrogram

I created this visualization with the following code:

 # Load required modules import networkx as nx import matplotlib.pyplot as plt from scipy.cluster.hierarchy import dendrogram # Construct the graph/hierarchy d = { 0: [1, 'd'], 1: ['a', 'b', 'c'], 'a': [], 'b': [], 'c': [], 'd': []} G = nx.DiGraph(d) nodes = G.nodes() leaves = set( n for n in nodes if G.out_degree(n) == 0 ) inner_nodes = [ n for n in nodes if G.out_degree(n) > 0 ] # Compute the size of each subtree subtree = dict( (n, [n]) for n in leaves ) for u in inner_nodes: children = set() node_list = list(d[u]) while len(node_list) > 0: v = node_list.pop(0) children.add( v ) node_list += d[v] subtree[u] = sorted(children & leaves) inner_nodes.sort(key=lambda n: len(subtree[n])) # <-- order inner nodes ascending by subtree size, root is last # Construct the linkage matrix leaves = sorted(leaves) index = dict( (tuple([n]), i) for i, n in enumerate(leaves) ) Z = [] k = len(leaves) for i, n in enumerate(inner_nodes): children = d[n] x = children[0] for y in children[1:]: z = tuple(subtree[x] + subtree[y]) i, j = index[tuple(subtree[x])], index[tuple(subtree[y])] Z.append([i, j, float(len(subtree[n])), len(z)]) # <-- float is required by the dendrogram function index[z] = k subtree[z] = list(z) x = z k += 1 # Visualize dendrogram(Z, labels=leaves) plt.show() 

Several key elements can be noted:

  • Give the data structure d , I am using the NetworkX oriented graph ( DiGraph ). Orientation is important, so we can determine which nodes are leaves (without children β†’ from degree zero) and inner_nodes (two or more children β†’ other than zero).
  • Usually in your dendrogram there is some weight associated with each edge, but in your example there were no weights. Instead, I used the number of leaves in the subtree embedded in each internal node n , as the weight for n .
  • If the inner node has more than two children, you need to add dummy internal nodes, since each row of the link matrix joins the two nodes together. This is why I write for y in children[1:]: etc.

I suggest that you can simplify this code based on what your data looks like before creating d in your example, so this might be more a proof of concept.

+1
source

Source: https://habr.com/ru/post/1243338/


All Articles