Calculation of the sum of the previous 3 rows in R data.table (squared grid)

I would like to calculate the amount of rainfall that has fallen in the last three days for each square of the grid, and add this as a new column in my data table. To be clear, I want to summarize the current and PREVIOUS two (2) days of precipitation, for each square of the metrological grid

library ( zoo ) library (data.table) # making the data.table rain <- c(NA, NA, NA, 0, 0, 5, 1, 0, 3, 10) # rainfall values to work with square <- c(1,1,1,1,1,1,1,1,1,2) # the geographic grid square for the rainfall measurement desired_result <- c(NA, NA, NA, NA, NA, 5, 6, 6, 4, NA ) # this is the result I'm looking for (the last NA as we are now on to the first day of the second grid square) weather <- data.table(rain, square, desired_result) # making the data.table 

My attempt to answer: this line was used to work, but no longer works

 weather[, rain_3 := filter(rain, rep(1, 2), sides = 1), by = list(square)] 

So I'm trying another method:

 # this next line gets the numbers right, but sums the following values, not the preceeding ones. weather$rain_3 <- rollapply(zoo(weather$rain), list(seq(-2,0)), sum) # here I add in the by weather$ square, but still no success weather$rain_3 <- rollapply(zoo(weather$rain), list(seq(-2,0)), sum, by= list(weather$square)) 

I would really appreciate any ideas or suggestions you might have.

Thank you very much!

+4
source share
5 answers
 weather[, rain_3 := filter(rain, rep(1, 3), sides = 1), by = list(square)] #Error in filter(rain, rep(1, 3), sides = 1) : # 'filter' is longer than time series weather[, rain_3 := if(.N > 2) filter(rain, rep(1, 3), sides = 1) else NA_real_, by = square] # rain square desired_result rain_3 # 1: NA 1 NA NA # 2: NA 1 NA NA # 3: NA 1 NA NA # 4: 0 1 NA NA # 5: 0 1 NA NA # 6: 5 1 5 5 # 7: 1 1 6 6 # 8: 0 1 6 6 # 9: 3 1 4 4 #10: 10 2 NA NA 

Make sure dplyr does not load as it masks filter . If you need dplyr, you can explicitly call stats::filter .

+2
source

Here's a quick and efficient solution using the latest version of data.table (v 1.9.6 +)

 weather[, rain_3 := Reduce(`+`, shift(rain, 0:2)), by = square] weather # rain square desired_result rain_3 # 1: NA 1 NA NA # 2: NA 1 NA NA # 3: NA 1 NA NA # 4: 0 1 NA NA # 5: 0 1 NA NA # 6: 5 1 5 5 # 7: 1 1 6 6 # 8: 0 1 6 6 # 9: 3 1 4 4 # 10: 10 2 NA NA 

The main idea here is to shift the rain column twice, and then sum the rows.

+18
source

The rollapply solution will be implemented as follows:

 weather[, rain_3 := rollapplyr(c(NA, NA, rain), 3, sum), by = square] 

giving:

  rain square desired_result rain_3 1: NA 1 NA NA 2: NA 1 NA NA 3: NA 1 NA NA 4: 0 1 NA NA 5: 0 1 NA NA 6: 5 1 5 5 7: 1 1 6 6 8: 0 1 6 6 9: 3 1 4 4 10: 10 2 NA NA 
+3
source

You almost got the answer yourself. rollsum (or rollapply in your case) gives a vector of length N-2, so you just need to fill in the necessary NA cells. This can be done as follows: roll<-c(NA,NA,rollsum(yourvector,k=3))

This is how I do it. I use roll_sum from the {RcppRoll} package because it is much faster and simplifies working with NA. A simple by argument from data.table allows you to group the result by square.

 library(RcppRoll) weather[,rain_3:=if(.N>2){c(NA,NA,roll_sum(rain,n=3))}else{NA},by=square] weather rain square desired_result rain_3 1: NA 1 NA NA 2: NA 1 NA NA 3: NA 1 NA NA 4: 0 1 NA NA 5: 0 1 NA NA 6: 5 1 5 5 7: 1 1 6 6 8: 0 1 6 6 9: 3 1 4 4 10: 10 2 NA NA 
+2
source

A dplyr solution:

 library(dplyr) weather %>% group_by(square) %>% mutate(rain_3 = rain + lag(rain) + lag(rain, n = 2L)) 

Result:

 Source: local data table [10 x 4] rain square desired_result rain_3 (dbl) (dbl) (dbl) (dbl) 1 NA 1 NA NA 2 NA 1 NA NA 3 NA 1 NA NA 4 0 1 NA NA 5 0 1 NA NA 6 5 1 5 5 7 1 1 6 6 8 0 1 6 6 9 3 1 4 4 10 10 2 NA NA 

If you want to assign rain3 your data set, you can use the symbol %<>% of maggritr in his pipe:

 library(magrittr) weather %<>% group_by...... 
0
source

Source: https://habr.com/ru/post/1245030/


All Articles