Data panel data lag.

I am currently processing panel data using data.table as follows:

 require(data.table) x <- data.table(id=1:10, t=rep(1:10, each=10), v=1:100) setkey(x, id, t) #so that things are in increasing order x[,lag_v:=c(NA, v[1:(length(v)-1)]),by=id] 

I am wondering if there is a better way to do this? I found something online about cross-connection that makes sense. However, a cross join will create a fairly large data.table for a large dataset, so I hesitate to use it.

+4
source share
1 answer

I'm not sure if this is very different from your approach, but you can use the fact that x used by the id key

 x[J(1:10), lag_v := c(NA,head(v, -1)) ] 

I have not tested whether this works faster than by , especially if it is already on.

Or, using the fact that t (do not use functions as variable names!) Is the id of time

 x <- data.table(id=1:10, t=rep(1:10, each=10), v=1:100) setkey(x, t) replacing <- J(setdiff(x[, unique(t)],1)) x[replacing, lag_v := x[replacing, v][,v]] 

but then again, using a double connection here seems ineffective

+5
source

Source: https://habr.com/ru/post/1441409/


All Articles