Building a large number of time series using ggplot. Is it possible to accelerate?

I work with thousands of meteorological time series (sample data can be downloaded here) https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt

Building this data with ggplot2 on my Linux Mint computer (64-bit, 8 GB RAM, dual-core 2.6 GHz) took a lot of time. I am wondering if there is a way to speed it up or a better way to build this data? Thanks in advance for any suggestions!

This is the code I'm using right now.

############################################################################## #### load required libraries library(RCurl) library(dplyr) library(reshape2) library(ggplot2) ############################################################################## #### Read data from URL dataURL = "https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt" tmp <- getURL(dataURL) df <- tbl_df(read.table(text = tmp, header=TRUE)) df ############################################################################## #### Plot time series using ggplot2 # Melt the data by date first df_melt <- melt(df, id="date") str(df_melt) df_plot <- ggplot(data = df_melt, aes(x = date, y = value, color = variable)) + geom_point() + scale_colour_discrete("Station #") + xlab("Date") + ylab("Daily Precipitation [mm]") + ggtitle('Daily precipitation from 1915 to 2011') + theme(plot.title = element_text(size=16, face="bold", vjust=2)) + # Change size & distance of the title theme(axis.text.x = element_text(angle=0, size=12, vjust=0.5)) + # Change size of tick text theme(axis.text.y = element_text(angle=0, size=12, vjust=0.5)) + theme( # Move x- & y-axis lables away from the axises axis.title.x = element_text(size=14, color="black", vjust=-0.35), axis.title.y = element_text(size=14, color="black", vjust=0.35) ) + theme(legend.title = element_text(colour="chocolate", size=14, face="bold")) + # Change Legend text size guides(colour = guide_legend(override.aes = list(size=4))) + # Change legend symbol size guides(fill = guide_legend(ncols=2)) df_plot 
+6
source share
1 answer

Part of your question suggests "the best way to build this data."

In this spirit, you seem to have two problems: firstly, you plan to build> 35,000 points along the x axis, which, as some of the comments indicate, will lead to overlapping pixels on anything but extremely large high-resolution monitors. Secondly, and more importantly IMO, you are trying to build 69 time series (stations) on the same site. In this type of situation, a heat map may be better suited.

 library(data.table) library(ggplot2) library(reshape2) # for melt(...) library(RColorBrewer) # for brewer.pal(...) url <- "http://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt" dt <- fread(url) dt[,Year:=year(as.Date(date))] dt.melt <- melt(dt[,-1,with=F],id="Year",variable.name="Station") dt.agg <- dt.melt[,list(y=sum(value)),by=list(Year,Station)] dt.agg[,Station:=factor(Station,levels=rev(levels(Station)))] ggplot(dt.agg,aes(x=Year,y=Station)) + geom_tile(aes(fill=y)) + scale_fill_gradientn("Annual\nPrecip. [mm]", colours=rev(brewer.pal(9,"Spectral")))+ scale_x_continuous(expand=c(0,0))+ coord_fixed() 

Pay attention to the use of data.tables . Your data set is quite large (due to all the columns, 35,000 rows are not that big). In this situation, data.tables will significantly speed up processing, especially fread(...) , which is much faster than the text import functions in the R database.

+8
source

Source: https://habr.com/ru/post/1200159/


All Articles