Connect an unbalanced panel dataset to have at least 2 consecutive observations in R

I have an unbalanced panel dataset in R. An example is the following:

dt <- data.frame(name= rep(c("A", "B", "C"), c(3,2,3)), 
                 year=c(2001:2003,2000,2002,2000:2001,2003))

> dt
  name year
1    A 2001
2    A 2002
3    A 2003
4    B 2000
5    B 2002
6    C 2000
7    C 2001
8    C 2003

Now for each nameI need to have at least 2 consecutive observations year. Therefore, I would like to delete lines 4, 5, and 8. What is the best way to do this in R?

EDIT: Thanks to the comment below, I can make it a little clearer. If I had an extra observation (line 9) with name= Cand year= 2004, I would like to save both lines 8 and 9 along with the rest.

+4
source share
3

() :

is.consecutive = duplicated(rbind(dt,transform(dt, year=year+1), 
                                     transform(dt, year=year-1)),
                            fromLast=TRUE)[1:nrow(dt)]

is.consecutive , . : TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE

, data.frame, . :

dt[is.consecutive,]
+4

(...?) , .

dt <- dt[order(dt$name, dt$year), ]

rl <- 2

do.call(rbind,
        by(dt, dt$name, function(x){
          run <- c(0, cumsum(diff(x$year) > 1))
          x[ave(run, run, FUN = length) >= rl, ]
        })
)
#     name year
# A.1    A 2001
# A.2    A 2002
# A.3    A 2003
# C.6    C 2000
# C.7    C 2001

rl <- 3

do.call(rbind,
        by(dt, dt$name, function(x){
          run <- c(0, cumsum(diff(x$year) > 1))
          x[ave(run, run, FUN = length) >= rl, ]
        })
)
#     name year
# A.1    A 2001
# A.2    A 2002
# A.3    A 2003
+4

ddply

library(plyr)
ddply(dt,"name",function(x) {
    cons_idx=which(diff(x$year)==1)
    cons_idx=sort(unique(c(cons_idx,cons_idx+1)))
    x[cons_idx,]
})
+3

Source: https://habr.com/ru/post/1529684/


All Articles