Conditional data.table merges with .EACHI

I play with the new conditional merge feature data.tableand it is very cool. I have a situation where I have two tables, dtBigand dtSmall, and there are several row matches in both datasets when this conditional merge occurs. Is there a way to combine these matches with a type function maxor minfor these matches? Here is a reproducible example that tries to mimic what I'm trying to accomplish.

Environment setup

## docker run --rm -ti rocker/r-base
## install.packages("data.table", type = "source",repos = "http://Rdatatable.imtqy.com/data.table")

Create two fake datasets

A create a “large” table with 50 rows (10 values ​​for each identifier).

library(data.table)
set.seed(1L)

# Simulate some data
dtBig <- data.table(ID=c(sapply(LETTERS[1:5], rep, 10, simplify = TRUE)), ValueBig=ceiling(runif(50, min=0, max=1000)))
dtBig[, Rank := frank(ValueBig, ties.method = "first"), keyby=.(ID)]

    ID ValueBig Rank
 1:  A      266    3
 2:  A      373    4
 3:  A      573    5
 4:  A      909    9
 5:  A      202    2
---                 
46:  E      790    9
47:  E       24    1
48:  E      478    2
49:  E      733    7
50:  E      693    6

Create a “small” dataset similar to the first, but with 10 rows (2 values ​​for each identifier)

dtSmall <- data.table(ID=c(sapply(LETTERS[1:5], rep, 2, simplify = TRUE)), ValueSmall=ceiling(runif(10, min=0, max=1000)))

    ID ValueSmall
 1:  A        478
 2:  A        862
 3:  B        439
 4:  B        245
 5:  C         71
 6:  C        100
 7:  D        317
 8:  D        519
 9:  E        663
10:  E        407

Combine

ID , ValueSmall ValueBig. max dtBig. . 2 , , . , .

## Method 1
dtSmall[dtBig, RankSmall := max(i.Rank), by=.EACHI, on=.(ID, ValueSmall >= ValueBig)]

## Method 2
setorder(dtBig, ValueBig)
dtSmall[dtBig, RankSmall2 := max(i.Rank), by=.EACHI, on=.(ID, ValueSmall >= ValueBig)]

    ID ValueSmall RankSmall RankSmall2 DesiredRank
 1:  A        478         1          4           4
 2:  A        862         1          7           7
 3:  B        439         3          4           4
 4:  B        245         1          2           2
 5:  C         71         1          1           1
 6:  C        100         1          1           1
 7:  D        317         1          2           2
 8:  D        519         3          5           5
 9:  E        663         2          5           5
10:  E        407         1          1           1

data.table data.table ?

+6
1

, ValueSmall ValueBig. dtBig.

setorder(dtBig, ID, ValueBig, Rank)
dtSmall[, r :=
  dtBig[.SD, on=.(ID, ValueBig <= ValueSmall), mult="last", x.Rank ]
]

    ID ValueSmall r
 1:  A        478 4
 2:  A        862 7
 3:  B        439 4
 4:  B        245 2
 5:  C         71 1
 6:  C        100 1
 7:  D        317 2
 8:  D        519 5
 9:  E        663 5
10:  E        407 1

, dtBig , max .EACHI, . , , .


max min ?

.EACHI, , (dtSmall ), ...

dtSmall[, r :=
  dtBig[.SD, on=.(ID, ValueBig <= ValueSmall), max(x.Rank), by=.EACHI ]$V1
]
+4

Source: https://habr.com/ru/post/1016257/


All Articles