This is a well-designed topic for R, see posts here and here . The answers to this question show that the for() *apply() alternatives improve clarity, facilitate parallelization, and in some circumstances speed up the problem. However, apparently, your real question is: โHow to do it fasterโ, because it takes a lot of time for you to be unhappy. Inside the loop, you perform 3 different tasks.
- Snatch a piece of data block with
filter() - Make a story.
- Save the chart in jpeg format.
There are several ways to complete all three of these steps, so try and rate them all. I will use the diamond data from ggplot2, because it is more than the data of cars. I hope that the differences in performance between the methods will be noticeable this way. I learned a lot from this chapter of Hadley Wickham's book on measuring performance .
So that I can use profiling, I put the following code into a block and save it in a separate R file with the name for_solution.r.
f <- function(){ param <- unique(diamonds$cut) for (i in param){ mcplt <- diamonds %>% filter(cut==i) %>% ggplot(aes(x=carat, y=price)) + geom_point() + facet_wrap(~color) + ggtitle(paste("Cut: ",i,sep="")) ggsave(mcplt, file=paste("Cut",i,".jpeg",sep="")) } }
and then I:
library(dplyr) library(ggplot2) source("for_solution.r",keep.source=TRUE) Rprof(line=TRUE) f() Rprof(NULL) summaryRprof(lines="show")
Studying this conclusion, I see that a block of code spends 97.25% of the time just saving files. Studying the source for ggsave() I see that this function does a lot of defensive programming to identify the type of output, then it opens the graphics device, prints and closes the device. So I am wondering if this particular step will help manually. I will also take advantage of the fact that the jpeg device will automatically create new files for each page, only to open and close the device once.
f1 <- function(){ param <- unique(diamonds$cut) jpeg("cut%03d.jpg",width=par("din")[1],height=par("din")[2],units="in",res=300)
and now profile again
Rprof(line=TRUE) f1() Rprof(NULL) summaryRprof(lines="show")
f1() still spends most of its time on print(mcplt) , and it's a little faster than before (1.96 seconds versus 2.18 seconds). One possible way to speed things up is to use a smaller device (lower resolution or smaller image); when I used the default values โโfor jpeg() , the difference was bigger, bigger 25% faster. I also tried changing the device to png() , but that is no different.
Based on profiling, I do not expect this to help, but for completeness I will try to handle the for loop and run everything inside dplyr using do() . I found this question and this one useful here.
jpeg("cut%03d.jpg",width=par("din")[1],height=par("din")[2],units="in",res=300) # open the jpeg device, change defaults to match ggsave() plots = diamonds %>% group_by(cut) %>% do({plot=ggplot(aes(x=carat, y=price),data=.) + geom_point() + facet_wrap(~color) + ggtitle(paste("Cut: ",.$cut,sep="")) print(plot)}) dev.off()
Running this code gives
Error: results are not data frames at positions: 1, 2, 3
but it seems to work. I believe the error occurs when do() returned, because the print () method does not return data.frame. Profiling seems to indicate that it is running a little faster, only 1.78 seconds. But I donโt like solutions that cause errors, even if they do not cause problems.
I need to stay here, but I have already learned a lot about where to focus. Other things to try will include:
- Using
parallel or something similar to run each piece of data in a separate process. I am not sure if this will help if the problem is saving the file, but if the image will be rendered by the processor, I think. - Try using data.table instead of dplyr, but again, this is the slow part of printing. A.
- Try basic graphics and trellis graphics and graphically instead of ggplot2. I do not know about the relative speed, but this can change.
- Buy a faster hard drive! I just compared the speed f () on my home computer with a regular hard drive on my working machine with an SSD - it is about 3 times slower than timing above.