Subset panel data that has a full time size

I have these panels, and, as you may have noticed, some of them at certain points in time do not notice the observations. For example, in “C” there is no data point for 2001 and “D” for 2002 and 2003.

> mydata
    id year sales profit
 1:  A 2000  2000    200
 2:  A 2001  2050    245
 3:  A 2002  2100    290
 4:  A 2003  2150    335
 5:  B 2000  2200    380
 6:  B 2001  2250    425
 7:  B 2002  2300    470
 8:  B 2003  2350    515
 9:  C 2000  2400    560
10:  C 2002  2500    650
11:  C 2003  2550    695
12:  D 2000  2600    740
13:  D 2001  2650    785

I tried something like below

subset(mydata, year==c(2000:2003)

The result is shown below.

   id year sales profit
1:  A 2000  2000    200
2:  A 2001  2050    245
3:  A 2002  2100    290
4:  A 2003  2150    335
5:  B 2000  2200    380
6:  B 2001  2250    425
7:  B 2002  2300    470
8:  B 2003  2350    515
9:  C 2000  2400    560
Warning message:
In year == c(2000:2003) :
  longer object length is not a multiple of shorter object length

I need data that includes objects with a full period from the beginning, 2000, to the end of 2003. In this case, it will be so.

   id year sales profit
1:  A 2000  2000    200
2:  A 2001  2050    245
3:  A 2002  2100    290
4:  A 2003  2150    335
5:  B 2000  2200    380
6:  B 2001  2250    425
7:  B 2002  2300    470
8:  B 2003  2350    515

Thank you for your time and answer in advance, but I would really appreciate it if the answer was a little easier, as I am very inexperienced and just started to learn about R.

+4
source share
3 answers

You can try something like:

library(data.table)
mydata[, ind := all(2000:2003 %in% year), id][(ind)]
#    id year sales profit  ind
# 1:  A 2000  2000    200 TRUE
# 2:  A 2001  2050    245 TRUE
# 3:  A 2002  2100    290 TRUE
# 4:  A 2003  2150    335 TRUE
# 5:  B 2000  2200    380 TRUE
# 6:  B 2001  2250    425 TRUE
# 7:  B 2002  2300    470 TRUE
# 8:  B 2003  2350    515 TRUE

With "tidyverse":

library(tidyverse)
mydata %>% 
  group_by(id) %>% 
  filter(all(2000:2003 %in% year))

( ):

mydata <- structure(list(id = c("A", "A", "A", "A", "B", "B", "B", "B", 
    "C", "C", "C", "D", "D"), year = c(2000L, 2001L, 2002L, 2003L, 
    2000L, 2001L, 2002L, 2003L, 2000L, 2002L, 2003L, 2000L, 2001L
    ), sales = c(2000L, 2050L, 2100L, 2150L, 2200L, 2250L, 2300L, 
    2350L, 2400L, 2500L, 2550L, 2600L, 2650L), profit = c(200L, 245L, 
    290L, 335L, 380L, 425L, 470L, 515L, 560L, 650L, 695L, 740L, 785L
    )), .Names = c("id", "year", "sales", "profit"), row.names = c(NA, 
    13L), class = c("data.table", "data.frame"))
+2

R ave id , 4:

< >

txt = '    id year sales profit
A 2000  2000    200
A 2001  2050    245
A 2002  2100    290
A 2003  2150    335
B 2000  2200    380
B 2001  2250    425
B 2002  2300    470
B 2003  2350    515
C 2000  2400    560
C 2002  2500    650
C 2003  2550    695
D 2000  2600    740
D 2001  2650    785'

df <- read.table(text=txt, header=TRUE)

df$grp_cnt <- ave(df$year, df$id, FUN=length)

df <- transform(subset(df, df$grp_cnt == 4), grp_cnt = NULL)

df    
#   id year sales profit
# 1  A 2000  2000    200
# 2  A 2001  2050    245
# 3  A 2002  2100    290
# 4  A 2003  2150    335
# 5  B 2000  2200    380
# 6  B 2001  2250    425
# 7  B 2002  2300    470
# 8  B 2003  2350    515
0

For completeness, there is also a solution data.tableusing a connection here:

mydata[mydata[, uniqueN(year), by = id][V1 == 4L, .(id)], on = "id"]
   id year sales profit
1:  A 2000  2000    200
2:  A 2001  2050    245
3:  A 2002  2100    290
4:  A 2003  2150    335
5:  B 2000  2200    380
6:  B 2001  2250    425
7:  B 2002  2300    470
8:  B 2003  2350    515
0
source

Source: https://habr.com/ru/post/1691200/


All Articles