A more efficient way to receive every nth element in data.table by factors

This thread has discussed how to do this for a data frame. I want to do a little harder than this:

dt <- data.table(A = c(rep("a", 3), rep("b", 4), rep("c", 5)) , B = rnorm(12, 5, 2))
dt2 <- dt[order(dt$A, dt$B)] # Sorting
# Always shows the factor from A
do.call(rbind, by(
  dt2, dt2$A,
  function(x) data.table(A = x[,A][1], B = x[,B][4])
              )
        )
#This is to reply to Vlo comment below. If I do this, it will return both row as 'NA'
    do.call(rbind,
        by(dt2, dt2$A, function(x) x[4])
      )
# Take the max value of B according to each factor A
do.call(rbind, by(dt2, dt2$A,
                  function(x) tail(x,1))
                  )
        )

What is an effective way to do this using data.tablenative functions?

0
source share
2 answers

In data.tableyou can refer to columns as if they were variables within the dt scope. So you do not need $. I.e

dt2 = dt[order(A, B)] # no need for dt$

. And if you need the 4th element Bfor each group from A:

dt2[, list(B=B[4L]), by=A]
#    A        B
# 1: a       NA
# 2: b 6.579446
# 3: c 6.378689

See @Vlo's answer for your second question.

, data.table s, , . FAQ ; Matt @user2014 .

+5

,

# Take the max value of B according to each factor A
dt2[, list(B=max(B)), by=A]
+3

Source: https://habr.com/ru/post/1687616/


All Articles