I have a long data frame containing meteorological data from the mast. It contains observations ( data$value ) taken simultaneously with different parameters (wind speed, direction, air temperature, etc. In data$param ) at different heights ( data$z )
I am trying to efficiently slice this data into $time , and then apply the functions to all the collected data. Usually functions are applied to one $param at a time (i.e. I apply different functions to wind speed than to air temperature).
Current approach
My current method is to use data.frame and ddply .
If I want to get all the wind speed data, I run this:
# find good data ---- df <- data[((data$param == "wind speed") & !is.na(data$value)),]
Then I run my function on df using ddply() :
df.tav <- ddply(df, .(time), function(x) { y <-data.frame(V1 = sum(x$value) + sum(x$z), V2 = sum(x$value) / sum(x$z)) return(y) })
Typically, V1 and V2 are calls to other functions. These are just examples. However, I need to run several functions on the same data.
Question
My current approach is very slow. I did not compare it, but it is slow enough, I can go for a cup of coffee and return before the data for the year is processed.
I have an order (hundreds) of towers for processing, each with a year of data and 10-12 heights, and therefore I am looking for something faster.
Sample data
data <- structure(list(time = structure(c(1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262305200, 1262305200, 1262305200, 1262305200, 1262305200, 1262305200, 1262305200), class = c("POSIXct", "POSIXt"), tzone = ""), z = c(0, 0, 0, 100, 100, 100, 120, 120, 120, 140, 140, 140, 160, 160, 160, 180, 180, 180, 200, 200, 200, 40, 40, 40, 50, 50, 50, 60, 60, 60, 80, 80, 80, 0, 0, 0, 100, 100, 100, 120), param = c("temperature", "humidity", "barometric pressure", "wind direction", "turbulence", "wind speed", "wind direction", "turbulence", "wind speed", "wind direction", "turbulence", "wind speed", "wind direction", "turbulence", "wind speed", "wind direction", "turbulence", "wind speed", "wind direction", "turbulence", "wind speed", "wind direction", "turbulence", "wind speed", "wind direction", "turbulence", "wind speed", "wind direction", "turbulence", "wind speed", "wind direction", "turbulence", "wind speed", "temperature", "barometric pressure", "humidity", "wind direction", "wind speed", "turbulence", "wind direction"), value = c(-2.5, 41, 816.9, 248.4, 0.11, 4.63, 249.8, 0.28, 4.37, 255.5, 0.32, 4.35, 252.4, 0.77, 5.08, 248.4, 0.65, 3.88, 313, 0.94, 6.35, 250.9, 0.1, 4.75, 253.3, 0.11, 4.68, 255.8, 0.1, 4.78, 254.9, 0.11, 4.7, -3.3, 816.9, 42, 253.2, 2.18, 0.27, 229.5)), .Names = c("time", "z", "param", "value"), row.names = c(NA, 40L), class = "data.frame")