It will be faster compared to .SD
system.time({setkey(DT, X) DT[DT[,Y==max(Y), by=X]$V1,]})
or
system.time(DT[DT[, .I[Y==max(Y)], by=X]$V1])
If there are only two columns,
system.time(DT[,list(Y=max(Y)), by=X]) # user system elapsed # 0.006 0.000 0.007
Compared with
system.time(DT[, .SD[Y == max(Y)], by = X] )
Based on comments by @Khashaa, @AnandaMahto, the CRAN version ( 1.9.4 ) gives a different result for the .SD method compared to the devel version ( 1.9.5 ) (which I used). You can get the same result for the "CRAN" version (from @Arun comments) by setting options
options(datatable.auto.index=FALSE)
NOTE. In the case of βlinks,β the solutions described here will return a few lines for each group (as indicated by @docendo discimus). My decisions are based on the "code" published by OP.
If there are "links", you can use unique with the by option (in case the number of columns > 2)
setkey(DT,X) unique(DT[DT[,Y==max(Y), by=X]$V1,], by=c("X", "Y"))
microbenchmarks
library(microbenchmark) f1 <- function(){setkey(DT,X)[DT[, Y==max(Y), by=X]$V1,]} f2 <- function(){DT[DT[, .I[Y==max(Y)], by=X]$V1]} f3 <- function(){DT[, list(Y=max(Y)), by=X]} f4 <- function(){DT[, .SD[Y==max(Y)], by=X]} microbenchmark(f1(), f2(), f3(), f4(), unit='relative', times=20L)
data
N = 10000 k = 25 set.seed(25) DT = data.table(X = rep(1:N, each = k), Y = rnorm(k*N))