How does a subset of a data frame into a factor and repeat the graph for each subset?

I am new to R. Forgive me if this is if this question has an obvious answer, but I could not find a solution. I have experience with SAS and can just think of this problem wrong.

I have a dataset with repeated measures of hundreds of items, each of which has several dimensions at different ages. Each object is identified by an ID variable. I would like to build each dimension (say, body weight) of the AGE for each individual subject (ID).

I used ggplot2 to do something like this:

ggplot(data = dataset, aes(x = AGE, y = WEIGHT )) + geom_line() + facet_wrap(~ID) 

This works well for a small number of objects, but will not work for the entire dataset.

I also tried something like this:

 ggplot(data=data, aes(x = AGE,y = BW, group = ID, colour = ID)) + geom_line() 

It also works for a small number of items, but cannot be read with hundreds of items.

I tried a subset using the following code:

 temp <- split(dataset,dataset$ID) 

but I'm not sure how to work with the result dataset. Or maybe there is a way to just tweak facet_wrap to create individual stories?

Thanks!

+6
source share
3 answers

Since you want to split the data set and plot for each factor level, I would apply it to one of the split-apply-return tools from the plyr package.

Here is an example toy using the mtcars . First create a graph and name it p , then use dlply to split the data set by a coefficient and return a graph for each level. I use %+% of ggplot2 to replace data.frame in the plot.

 p = ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_line() require(plyr) dlply(mtcars, .(cyl), function(x) p %+% x) 

This returns all charts, one by one. If you name the resulting list object, you can also call one graph at a time.

 plots = dlply(mtcars, .(cyl), function(x) p %+% x) plots[1] 

Edit

I started thinking about putting a headline on each plot based on a factor that seemed to be useful.

 dlply(mtcars, .(cyl), function(x) p %+% x + facet_wrap(~cyl)) 

Edit 2

Here is one way to save them in one document, one chart per page. This works with a list of graphs called plots . He saves them all in one document, one chart per page. I have not changed the default values ​​in pdf , but you can, of course, examine the changes you can make.

 pdf() plots dev.off() 

Updated to use dplyr instead of plyr . This is done in do , and the output will have a named column that contains all the graphs in a list.

 library(dplyr) plots = mtcars %>% group_by(cyl) %>% do(plots = p %+% . + facet_wrap(~cyl)) Source: local data frame [3 x 2] Groups: <by row> cyl plots 1 4 <S3:gg, ggplot> 2 6 <S3:gg, ggplot> 3 8 <S3:gg, ggplot> 

To see graphs in R, just ask about the column that contains the graphs.

 plots$plots 

And save as PDF

 pdf() plots$plots dev.off() 
+18
source

A few years ago I wanted to do something similar - to speak separate trajectories for ~ 2500 participants with 1-7 measurements each. I did it like this using plyr and ggplot2 :

 library(plyr) library(ggplot2) d_ply(dat, .var = "participant_id", .fun = function(x) { # Generate the desired plot ggplot(x, aes(x = phase, y = result)) + geom_point() + geom_line() # Save it to a file named after the participant # Putting it in a subdirectory is prudent ggsave(file.path("plots", paste0(x$participant_id, ".png"))) }) 

A bit slow, but it worked. If you want to get an idea of ​​the trajectories of all participants in one plot (for example, your second example, for example, a spaghetti plot), you can adjust the transparency of the lines (forget to color them):

 ggplot(data = dat, aes(x = phase, y = result, group = participant_id)) + geom_line(alpha = 0.3) 
+3
source
 lapply(temp, function(X) ggplot(X, ...)) 

Where X is your multiplied data

Keep in mind that you may have to explicitly print the ggplot object ( print(ggplot(X, ..)) )

+2
source

Source: https://habr.com/ru/post/955146/


All Articles