How to visualize the use of datasets inside a script in R

This is more of a question about how to use long orga R scripts. I have many very long scenarios in R. I often find myself in a situation where I import one raw data set, and then from this I could create other data sets and so on, which are used for different aspects of the analysis. Thus, the original data set is forked to create others. With long scenarios, it is quite difficult to understand the origin of the different branches. Does anyone have any methods to handle this, that is, how to get an overview of how datasets are derived from each other. Maybe some kind of visualization tool?

+4
source share
1 answer

With DiagrammeR you can build a block diagram in stages, if necessary, rendering using render_graph. This can get a little cumbersome, though, if you're not diligent, as you can see with the trivial example below.

library(DiagrammeR)
# Create an empty graph
graph <- create_graph()

#create simple data frame of individuals of random ages
df<-data.frame(id=1:100,age=rnorm(100,40,5))
head(df)
# Add a node for df, df$id, and df$age
graph <- add_node(graph, node = "df")
graph <- add_node(graph, node = "df$age")
graph <- add_node(graph, node = "df$id")

# Vector of breaks for cut
breaks <- c(0,seq(20,60,by=5),Inf)
# Add a node for breaks
graph <- add_node(graph,node = "breaks")

# Create df.cut data frame of age intervals
df.cut <- data.frame(id = df$id,
                     interval = cut(df$age,breaks = breaks))

# Add nodes for df.cut, data.frame, cut
# Use a different node shape for operations
graph <- add_node(graph, 
                  node = "df.cut")
graph <- add_node(graph, 
                  node = "data.frame", 
                  shape = "square")
graph <- add_node(graph, 
                  node = "cut", 
                  shape = "square")

# Add edges for df$id, df$age
# Use different arrowhead to indicate operation
graph <- add_edges(graph,
                   create_edges(
                     from = c("df","df"),
                     to = c("df$id","df$age"),
                     rel = "to_get",
                     arrowhead = "box")
)

# Add edges for cut 
graph <- add_edges(graph, 
                   from = c("df$age", "breaks", "cut"),
                   to = c("cut", "cut", "df.cut"),
                   rel = c("to_get","to_get", "to_get"))

# Add edges for data.frame
graph <- add_edges(graph, 
                   from = c("df$id", "cut", "data.frame"),
                   to = c("data.frame", "data.frame", "df.cut"),
                   rel = c("to_get","to_get", "to_get"))

render_graph(graph)

Diagram of simple data processing in R

+3
source

Source: https://habr.com/ru/post/1619711/


All Articles