Getting the previous n-lines in a data frame?

I have the following data frame.

date id value 2012-01-01 1 0.3 2012-01-01 2 0.5 2012-01-01 3 0.2 2012-01-01 4 0.8 2012-01-01 5 0.2 2012-01-01 6 0.8 2012-01-01 7 0.1 2012-01-01 8 0.4 2012-01-01 9 0.3 2012-01-01 10 0.2 

There are several dates and for each date, I have 10 id values, as shown above, and a value field. I would like each identifier to find the previous n values ​​in the "value" field. For example, if n = 3, then I want the result to be as follows.

 date id value value1 value2 value3 2012-01-01 1 0.3 NA NA NA 2012-01-01 2 0.5 NA NA NA 2012-01-01 3 0.2 NA NA NA 2012-01-01 4 0.8 0.2 0.5 0.3 2012-01-01 5 0.2 0.8 0.2 0.5 ... 

Is there an easy way to get to this either through plyr or using mapply? Thank you very much in advance.

+6
source share
2 answers

You can do this quite easily using the basic functions:

 id <- 1:10 value <- c(0.3,0.5,0.2,0.8,0.2,0.8,0.1,0.4,0.3,0.2) test <- data.frame(id,value) test$valprev1 <- c(rep(NA,1),head(test$value,-1)) test$valprev2 <- c(rep(NA,2),head(test$value,-2)) test$valprev3 <- c(rep(NA,3),head(test$value,-3)) 

Result

  id value valprev1 valprev2 valprev3 1 1 0.3 NA NA NA 2 2 0.5 0.3 NA NA 3 3 0.2 0.5 0.3 NA 4 4 0.8 0.2 0.5 0.3 5 5 0.2 0.8 0.2 0.5 6 6 0.8 0.2 0.8 0.2 7 7 0.1 0.8 0.2 0.8 8 8 0.4 0.1 0.8 0.2 9 9 0.3 0.4 0.1 0.8 10 10 0.2 0.3 0.4 0.1 

Made a mistake here earlier - here is the sapply version in the function:

 prevrows <- function(data,n) {sapply(1:n,function(x) c(rep(NA,x),head(data,-x)))} prevrows(test$value,3) 

What gives exactly this:

  [,1] [,2] [,3] [1,] NA NA NA [2,] 0.3 NA NA [3,] 0.5 0.3 NA [4,] 0.2 0.5 0.3 [5,] 0.8 0.2 0.5 [6,] 0.2 0.8 0.2 [7,] 0.8 0.2 0.8 [8,] 0.1 0.8 0.2 [9,] 0.4 0.1 0.8 [10,] 0.3 0.4 0.1 

You can then apply this to each dataset in your data as follows:

 result <- tapply(test$value,test$date,prevrows,3) 

Which gives a bunch of lists for each set of dates. You can add them back to your dataset with:

 data.frame(test,do.call(rbind,result)) 
+6
source

Using data.table v1.9.5 +, it is as simple as:

 library(data.table) setDT(dt) lags <- dt[, shift(value, n = c(1,2,3))] 

or add them as additional columns in the same data table.

 dt[, c("lag1", "lag2", "lag3") := shift(value, n = c(1,2,3))] 
+3
source

Source: https://habr.com/ru/post/916878/


All Articles