How to visualize a large network in R?

Question

How to visualize a large network in R?

Network visualizations are becoming generally accepted in science in practice. But as networks grow in size, general visualizations become less useful. Just too many nodes / vertices and links / edges. Often, visualization efforts result in "balls".

Some new approaches to solving this problem were proposed, for example:

Merge Edges:
- http://vis.stanford.edu/papers/divided-edge-bundling or
- https://gephi.org/tag/edge-bundling/
Hierarchical binding of edges:
- http://graphics.cs.illinois.edu/sites/graphics.dev.engr.illinois.edu/files/edgebundles.pdf
Group attribute:
- http://wiki.cytoscape.org/Cytoscape_3/UserManual
- How to create a grouped layout in igraph?

I am sure there are many more approaches. So my question is: How to overcome the hair problem, i.e. How to visualize large networks using R?

Here is some code that mimics an example network:

# Load packages lapply(c("devtools", "sna", "intergraph", "igraph", "network"), install.packages) library(devtools) devtools::install_github(repo="ggally", username="ggobi") lapply(c("sna", "intergraph", "GGally", "igraph", "network"), require, character.only=T) # Set up data set.seed(123) g <- barabasi.game(1000) # Plot data g.plot <- ggnet(g, mode = "fruchtermanreingold") g.plot

enter image description here

These questions are related to visualization of an indirect graph, which is too large for GraphViz? . However, here I am not looking for general recommendations on software, but examples (using the data above) that help to make a good visualization of a large network using R (comparable to examples in this stream: R: Scatterplot with too many points ).

+42

r graph visualization social-networking graph-visualization

majom Mar 17 '14 at 11:35

source share

4 answers

wjrl · Answer 1 · 2014-03-19 05:44

Another way to visualize very large networks is BioFabric (www.BioFabric.org), which uses horizontal lines instead of dots to represent nodes. The edges are then displayed using vertical line segments. A quick D3 demo of this method is shown at: http://www.biofabric.org/gallery/pages/SuperQuickBioFabric.html .

BioFabric is a Java application, but a simple version of R is available at: https://github.com/wjrl/RBioFabric .

Here is the R code snippet:

  # You need 'devtools': install.packages("devtools") library(devtools) # you need igraph: install.packages("igraph") library(igraph) # install and load 'RBioFabric' from GitHub install_github('RBioFabric', username='wjrl') library(RBioFabric) # # This is the example provided in the question: # set.seed(123) bfGraph = barabasi.game(1000) # This example has 1000 nodes, just like the provided example, but it # adds 6 edges in each step, making for an interesting shape; play # around with different values. # bfGraph = barabasi.game(1000, m=6, directed=FALSE) # Plot it up! For best results, make the PDF in the same # aspect ratio as the network, though a little extra height # covers the top labels. Given the size of the network, # a PDF width of 100 gives us good resolution. height <- vcount(bfGraph) width <- ecount(bfGraph) aspect <- height / width; plotWidth <- 100.0 plotHeight <- plotWidth * (aspect * 1.2) pdf("myBioFabricOutput.pdf", width=plotWidth, height=plotHeight) bioFabric(bfGraph) dev.off()

Here is a frame from the BioFabric version of the data provided by an expert, although networks created with values m> 1 are more interesting. The insert details a close-up of the upper left corner of the network; node BF4 is the highest degree of a node in the network, and the default layout is to search by the width of the network (ignoring edge directions), starting from this node, with neighboring nodes intersecting in decreasing order of the node degree. Please note that we can immediately see that, for example, about 60% of the node neighbors of BF4 have degree 1. We can also see from a strict 45-degree lower boundary that this 1000-node network has 999 edges and is therefore a tree.

BioFabric presentation of example data

Full disclosure: BioFabric is the tool I wrote.

Vincent Labatut · Answer 2 · 2014-03-18 05:29

What an interesting question, I did not know most of the tools that you indicated, so thanks. You can add HivePlot to the list. This is a deterministic method consisting in projecting nodes on a fixed number of axes (usually 2 or 3). Look at the linked page, there are many visual examples.

enter image description here

It works better if your dataset has a categorical nodal attribute so you can use it to choose which axis the node goes to. For example, when studying the university’s social network: students on one axis, teachers on the other and administrative staff on the third. But, of course, it can also work with a discretized numerical attribute (for example, young, middle and old people on their respective axes).

Then you need another attribute, and this time it should be numerical (or at least ordinal). It is used to determine the position of a node on its axis. You can also use some topological measure, such as degree or transitivity (clustering coefficient).

How to build hiveplot http://www.hiveplot.net/img/hiveplot-undirected-01.png

The fact that the method is deterministic is interesting because it allows you to compare different networks representing different (but comparable) systems. For example, you can compare two universities (provided that you use the same attributes / measures to determine the axes and position). It also allows you to describe the same network in different ways, choosing different combinations of attributes / measures to generate visualization. This is the recommended way to visualize the network, in fact, thanks to the so-called hive panel.

Several software tools capable of generating these graphs are listed on the page mentioned at the beginning of this post, including implementations in Java and R.

Jacob H · Answer 3 · 2016-01-25 20:33

I recently addressed this issue. As a result, I came up with a different solution. Collapse the graph by communities / clusters. This approach is similar to the third option described above. As a warning, this approach will work better with undirected graphs. For example:

 library(igraph) set.seed(123) g <- barabasi.game(1000) %>% as.undirected() #Choose your favorite algorithm to find communities. The algorithm below is great for large networks but only works with undirected graphs c_g <- fastgreedy.community(g) #Collapse the graph by communities. This insight is due to this post http://stackoverflow.com/questions/35000554/collapsing-graph-by-clusters-in-igraph/35000823#35000823 res_g <- simplify(contract(g, membership(c_g)))

The result of this process is the following figure, where the vertex names represent community membership.

 plot(g, margin = -.5)

The foregoing is noticeably better than this disgusting mess.

 plot(r_g, margin = -.5)

To associate communities with the original peaks, you will need something similar to the following

 mem <- data.frame(vertices = 1:vcount(g), memeber = as.numeric(membership(c_g)))

IMO is a good approach for two reasons. Firstly, it can theoretically deal with any size graph. The community search process can be constantly repeated on collapsed graphs. Secondly, adopting an interactive approach will produce very readable results. For example, you can imagine that a user can click on a vertex in a collapsed graph to expand this community by opening all of their original vertices.

Jacob H · Answer 4 · 2016-01-22 22:42

Another interesting package is networkD3 . There are many ways to present graphs in this library. In particular, I find forceNetwork interesting option. It is interactive and therefore allows you to truly explore your network. This is great for EDA, but maybe too wiggly for the final work.

How to visualize a large network in R?

More articles: