I play with the new conditional merge feature data.table
and it is very cool. I have a situation where I have two tables, dtBig
and dtSmall
, and there are several row matches in both datasets when this conditional merge occurs. Is there a way to combine these matches with a type function max
or min
for these matches? Here is a reproducible example that tries to mimic what I'm trying to accomplish.
Environment setup
## docker run --rm -ti rocker/r-base
## install.packages("data.table", type = "source",repos = "http://Rdatatable.imtqy.com/data.table")
Create two fake datasets
A create a “large” table with 50 rows (10 values for each identifier).
library(data.table)
set.seed(1L)
dtBig <- data.table(ID=c(sapply(LETTERS[1:5], rep, 10, simplify = TRUE)), ValueBig=ceiling(runif(50, min=0, max=1000)))
dtBig[, Rank := frank(ValueBig, ties.method = "first"), keyby=.(ID)]
ID ValueBig Rank
1: A 266 3
2: A 373 4
3: A 573 5
4: A 909 9
5: A 202 2
---
46: E 790 9
47: E 24 1
48: E 478 2
49: E 733 7
50: E 693 6
Create a “small” dataset similar to the first, but with 10 rows (2 values for each identifier)
dtSmall <- data.table(ID=c(sapply(LETTERS[1:5], rep, 2, simplify = TRUE)), ValueSmall=ceiling(runif(10, min=0, max=1000)))
ID ValueSmall
1: A 478
2: A 862
3: B 439
4: B 245
5: C 71
6: C 100
7: D 317
8: D 519
9: E 663
10: E 407
Combine
ID
, ValueSmall
ValueBig
. max
dtBig
. . 2 , , . , .
dtSmall[dtBig, RankSmall := max(i.Rank), by=.EACHI, on=.(ID, ValueSmall >= ValueBig)]
setorder(dtBig, ValueBig)
dtSmall[dtBig, RankSmall2 := max(i.Rank), by=.EACHI, on=.(ID, ValueSmall >= ValueBig)]
ID ValueSmall RankSmall RankSmall2 DesiredRank
1: A 478 1 4 4
2: A 862 1 7 7
3: B 439 3 4 4
4: B 245 1 2 2
5: C 71 1 1 1
6: C 100 1 1 1
7: D 317 1 2 2
8: D 519 3 5 5
9: E 663 2 5 5
10: E 407 1 1 1
data.table
data.table
?