What about
DT <- data.table(test) setkey(DT, id) DT[J(unique(id)), mult = "first"]
Edit
There is also a unique method for data.tables that will return the first row using a key
jdtu <- function() unique(DT)
I think that if you order test outside the test, then you can also remove the setkey and data.table from the test (since setkey basically sorts by id, just like order ).
set.seed(21) test <- data.frame(id=sample(1e3, 1e5, TRUE), string=sample(LETTERS, 1e5, TRUE)) test <- test[order(test$id), ] DT <- data.table(DT, key = 'id') ju <- function() test[!duplicated(test$id),] jdt <- function() DT[J(unique(id)),mult = 'first'] library(rbenchmark) benchmark(ju(), jdt(), replications = 5)
and with a lot of data
** Edit using a unique method **
set.seed(21) test <- data.frame(id=sample(1e4, 1e6, TRUE), string=sample(LETTERS, 1e6, TRUE)) test <- test[order(test$id), ] DT <- data.table(test, key = 'id') test replications elapsed relative user.self sys.self 2 jdt() 5 0.09 2.25 0.09 0.00 3 jdtu() 5 0.04 1.00 0.05 0.00 1 ju() 5 0.22 5.50 0.19 0.03
A unique method is faster.
mnel Nov 08 2018-12-12T00: 00Z
source share