Aggregation, restructuring of hourly time series data in R

Question

Aggregation, restructuring of hourly time series data in R

I have an annual hourly data indicator in a data frame in R:

> str(df.MHwind_load) # compactly displays structure of data frame 'data.frame': 8760 obs. of 6 variables: $ Date : Factor w/ 365 levels "2010-04-01","2010-04-02",..: 1 1 1 1 1 1 1 1 1 1 ... $ Time..HRs. : int 1 2 3 4 5 6 7 8 9 10 ... $ Hour.of.Year : int 1 2 3 4 5 6 7 8 9 10 ... $ Wind.MW : int 375 492 483 476 486 512 421 396 456 453 ... $ MSEDCL.Demand: int 13293 13140 12806 12891 13113 13802 14186 14104 14117 14462 ... $ Net.Load : int 12918 12648 12323 12415 12627 13290 13765 13708 13661 14009 ...

Keeping the hourly structure, I would like to know how to extract

specific month / group of months
first day / first week, etc. every month
all Mondays, all days, etc. of the year

I tried to use “cut” with no result, and after looking on the Internet, think that “lubridate” could do this, but did not find any suitable examples. I am very grateful for the help in solving this problem.

Edit: The sample data in the data frame is below:

  Date Hour.of.Year Wind.MW datetime 1 2010-04-01 1 375 2010-04-01 00:00:00 2 2010-04-01 2 492 2010-04-01 01:00:00 3 2010-04-01 3 483 2010-04-01 02:00:00 4 2010-04-01 4 476 2010-04-01 03:00:00 5 2010-04-01 5 486 2010-04-01 04:00:00 6 2010-04-01 6 512 2010-04-01 05:00:00 7 2010-04-01 7 421 2010-04-01 06:00:00 8 2010-04-01 8 396 2010-04-01 07:00:00 9 2010-04-01 9 456 2010-04-01 08:00:00 10 2010-04-01 10 453 2010-04-01 09:00:00 .. .. ... .......... ........ 8758 2011-03-31 8758 302 2011-03-31 21:00:00 8759 2011-03-31 8759 378 2011-03-31 22:00:00 8760 2011-03-31 8760 356 2011-03-31 23:00:00

EDIT: I would like to perform additional time-based operations in one dataset 1. Perform hourly averaging for all data points, as well as the average of all values in the first hour of every day of the year. The output will be the "hourly profile" of the whole year (24 points in time) 2. Do the same for each week and each month. Get 52 and 12 hourly profiles, respectively. 3. Make seasonal averages, for example, from June to September.

+6

r time-series

avg Mar 26 '12 at 4:49

source share

3 answers

mpiktas · Answer 1 · 2012-03-26T07:18:22+0000

Convert the date to a format that lubridate understands, and then use the month , mday , wday functions, respectively.

Suppose you have data.frame with the time stored in the Date column, then the answer to your questions will be as follows:

  ###dummy data.frame df <- data.frame(Date=c("2012-01-01","2012-02-15","2012-03-01","2012-04-01"),a=1:4) ##1. Select rows for particular month subset(df,month(Date)==1) ##2a. Select the first day of each month subset(df,mday(Date)==1) ##2b. Select the first week of each month ##get the week numbers which have the first day of the month wkd <- subset(week(df$Date),mday(df$Date)==1) ##select the weeks with particular numbers subset(df,week(Date) %in% wkd) ##3. Select all mondays subset(df,wday(Date)==1)

conjugateprior · Answer 2 · 2012-03-26T07:19:52+0000

First, switch to the Date view: as.Date(df.MHwind_load$Date)
Then call weekdays on the date vector to get a new coefficient, labeled as the day of the week.
Then call months on the date vector to get the new factor labeled with the name of the month
If necessary, create the variable years (see below).

Now subset data frame using the appropriate combination of them. Step 2. gets the answer to your task 3. Steps 3. and 4. set task 1. Problem 2 may require a line or two from R. Or just select the lines that correspond, say, to all Mondays a month and call unique or its alter- ego duplicated for results.

So you go ...

 newdf <- df.MHwind_load ## build an augmented data set newdf$d <- as.Date(newdf$Date) newdf$month <- months(newdf$d) newdf$day <- weekdays(newdf$d) ## for some reason R has no years function. Here one years <- function(x){ format(as.Date(x), format = "%Y") } newdf$year <- years(newdf$d) # get observations from January to March of every year subset(newdf, month %*% in c('January', 'February', 'March')) # get all Monday observations subset(newdf, day == 'Monday') # get all Mondays in 1999 subset(newdf, day == 'Monday' & year == '1999') # slightly fancier: _first_ Monday of each month # get the first weeks first.week.of.month <- !duplicated(cbind(newdf$month, newdf$day)) # now pull out the mondays subset(newdf, first.monday.of.month & day=='Monday')

Bryan goodrich · Answer 3 · 2012-03-26T17:26:03+0000

Since you are not asking about the time (hourly) part of your data, it is better to store your data as a Date object. Otherwise, you might be interested in chron, which also has some handy features, as you will see below.

As for answering the Conjugate Prior call, you should save the date data as a Date object. Since your data already conforms to the default format ('yyyy-mm-dd'), you can just call as.Date on it. Otherwise, you will need to specify your string format. I would also use your factor as.character to make sure you don't get errors in the string. I know that for this reason I ran into problems with date factors (possibly fixed in the current version).

 df.MHwind_load <- transform(df.MHwind_load, Date = as.Date(as.character(Date)))

Now it will be useful for you to create wrapper functions that extract the information you need. You can use the transformation, as I did above, to simply add those columns that represent months, days, years, etc., and then a subset of them logically. Alternatively, you can do something like this:

 getMonth <- function(x, mo) { # This function assumes w/in single year vector isMonth <- month(x) %in% mo # Boolean of matching months return(x[which(isMonth)] # Return vector of matching months } # end function

Or, in short form

 getMonth <- function(x, mo) x[month(x) %in% mo]

This is simply a compromise between storing this information (transforming frame) or processing it as desired (use access methods).

A more complex process is your need, say, on the first day of the month. However, this is not entirely difficult. Below is a function that will return all these values, but it’s quite simple to multiply the sorted vector of values for a given month and take them first.

 getFirstDay <- function(x, mo) { isMonth <- months(x) %in% mo x <- sort(x[isMonth]) # Look at only those in the desired month. # Sort them by date. We only want the first day. nFirsts <- rle(as.numeric(x))$len[1] # Returns length of 1st days return(x[seq(nFirsts)]) } # end function

A simpler alternative would be

 getFirstDayOnly <- function(x, mo) {sort(x[months(x) %in% mo])[1]}

I did not prototype them, since you did not provide any sample data, but this is the approach that will help you get the information you need. It is up to you to figure out how to include them in your workflow. For example, say that you want to get the first day for each month of a given year (provided that we look only at one year, you can create wrappers or pre-process your vector one year before).

 # Return a vector of first days for each month df <- transform(df, date = as.Date(as.character(date))) sapply(unique(months(df$date)), # Iterate through months in Dates function(month) {getFirstDayOnly(df$date, month)})

The above can also be designed as a separate convenience function that uses a different access function. Thus, you create a series of direct but concise methods for obtaining fragments of the necessary information. Then you simply combine them to create very simple and easily interpreted functions that you can use in your scripts to determine exactly what you want.

You can use the examples above to figure out how to prototype other shells to access the date information you need. If you need help with these questions, feel free to ask in the comments.

Aggregation, restructuring of hourly time series data in R

More articles: