Accelerating Unequal Unification

Inputs

> specialty.dt
   specialty        p1        p2
1:      ZKWM 0.0000000 0.7377049
2:      MZAY 0.7377049 1.0000000

> provider.dt
   provSysId       prob
1:        23 0.94225972
2:        16 0.39277028
3:         8 0.07162044
4:        25 0.42598790
5:         7 0.90370561
6:        12 0.71343887

Output

> prov_spec.dt
   provSysId       prob specialty        p1        p2
1:        23 0.94225972      MZAY 0.7377049 1.0000000
2:        16 0.39277028      ZKWM 0.0000000 0.7377049
3:         8 0.07162044      ZKWM 0.0000000 0.7377049
4:        25 0.42598790      ZKWM 0.0000000 0.7377049
5:         7 0.90370561      MZAY 0.7377049 1.0000000
6:        12 0.71343887      ZKWM 0.0000000 0.7377049

The code for creating the tables is given below. It takes almost 30 seconds for num.provider=5and num.specialty=10000to create the output. I was wondering if there is a faster way to get the same result (without the first Cartesian product, because it will require a lot of memory).

require(data.table)

num.specialty <- 50
num.provider <- 10000

specialty.dt <- data.table(specialty=replicate(num.specialty, paste(sample(LETTERS, 4, replace=TRUE), collapse="")))[,
    cnt:=sample(1:50, .N, replace=T)][, prob:=cnt/sum(cnt)][, p2:=cumsum(prob)][, p1:=shift(p2,,0)][, 
    c("specialty","p1","p2"), with=FALSE]

provider.dt <- data.table(provSysId=sample(seq(num.provider+1,num.provider*5), num.provider, replace=FALSE))[, prob:=runif(.N)]

system.time({
prov_spec.dt <- rbindlist(lapply(1:num.provider, function(n) {r <- provider.dt[n]; cbind(r,specialty.dt[p1 <= r[,prob] & r[,prob] < p2]) }))
})
+4
source share
1 answer

Yours rbindlist(lapply(...))can be replaced by an unconfigured connection using version 1.9.7 of the data table

specialty.dt[ provider.dt, on = .(p1 <= prob, p2 > prob)]

It joins directly to specialty.dton provider.dt, using the condition that p1 <= proband prob < p2.


References

Here is a list of related questions

And here is Arun’s conversation

+4
source

Source: https://habr.com/ru/post/1661264/


All Articles