R - save the first observation of the group identified by several variables (Stata equivalent) bys var1 var2: keep if _n == 1 ")

Therefore, I am currently facing a problem in R that I know exactly how to deal with in Stata, but wasted more than two hours to execute in R.

Using the data.frame below, I want to get exactly one first observation for each group, while the groups are formed by several variables and should be sorted by another variable, that is, mydata data.frame obtained by:

id <- c(1,1,1,1,2,2,3,3,4,4,4)
day <- c(1,1,2,3,1,2,2,3,1,2,3)
value <- c(12,10,15,20,40,30,22,24,11,11,12)
mydata <- data.frame(id, day, value)

Must be converted to:

   id day value   
   1   1    10 
   1   2    15 
   1   3    20 
   2   1    40 
   2   2    30 
   3   2    22 
   3   3    24 
   4   1    11 
   4   2    11 
   4   3    12 

( row[1]: (id,day)=(1,1)), (, ).

Stata :

bys id day (value): keep if _n == 1

, , :

mydata$id1 <- paste(mydata$id,"000",mydata$day, sep="")  ### the single group identifier

myid.uni <- unique(mydata$id1)
a<-length(myid.uni)

last <- c()

for (i in 1:a) {
  temp<-subset(mydata, id1==myid.uni[i])
  if (dim(temp)[1] > 1) {
    last.temp<-temp[dim(temp)[1],]
  }
  else {
    last.temp<-temp
  }
  last<-rbind(last, last.temp)
}

last

, :
1. ( ).
2. , Stata.
3. ( 100 000 , 6), 1,5 .

Stata bys var1 var2: keep if _n == 1?

+5
3

data.frame, by:

mydata <- mydata[with(mydata, do.call(order, list(id, day, value))), ]

do.call(rbind, by(mydata, list(mydata$id, mydata$day), 
                  FUN=function(x) head(x, 1)))

"data.table". data.frame :

library(data.table)

DT <- data.table(mydata, key = "id,day")
DT[, head(.SD, 1), by = key(DT)]
#     id day value
#  1:  1   1    10
#  2:  1   2    15
#  3:  1   3    20
#  4:  2   1    40
#  5:  2   2    30
#  6:  3   2    22
#  7:  3   3    24
#  8:  4   1    11
#  9:  4   2    11
# 10:  4   3    12

, , data.table :

DT <- data.table(id, day, value, key = "id,day")
DT[, n := rank(value, ties.method="first"), by = key(DT)][n == 1]

, , R:

Ranks <- with(mydata, ave(value, id, day, FUN = function(x) 
  rank(x, ties.method="first")))
mydata[Ranks == 1, ]
+5

dplyr .

library(dplyr)
mydata %>% group_by(id, day) %>% filter(row_number(value) == 1)

R, Stata: , .

+10

data.table, , mydata , , :

library(data.table)
mydata <- data.table(my.data)
mydata <- mydata[, .SD[1], by = .(id, day)]

dplyr magrittr:

library(dplyr)
mydata <- mydata %>%
  group_by(id, day) %>%
  slice(1) %>%
  ungroup()

If you don’t add ungroup()to the end, the dplyr grouping structure will still be present and may ruin some of your subsequent functions.

0
source

Source: https://habr.com/ru/post/1525223/


All Articles