R Joining multiple states using data.table

I have a large dataset and lookup table. I need to return for each row in the dataset the smallest value for the rows in the search where the conditions are met.

Given the size of my dataset, I am reluctant to crack the iffy solution together, cross-connecting, as this would create many millions of records. I hope someone can suggest a solution that (ideally) uses the r or data.table base, as they are already being used in an efficient way.

Example

A<-seq(1e4,9e4,1e4)
B<-seq(0,1e4,1e3)

dt1<-data.table(expand.grid(A,B),ID=1:nrow(expand.grid(A,B)))
setnames(dt1, c("Var1","Var2"),c("A","B"))

lookup<-data.table(minA=c(1e4,1e4,2e4,2e4,5e4),
                 maxA=c(2e4,3e4,7e4,6e4,9e4),
                 minB=rep(2e3,5),
                 Val=seq(.1,.5,.1))

# Sample  Desired Value
     A     B    ID Val
99: 90000 10000 99 0.5

In SQL, I would then write something line by line

SELECT ID, A, B, min(Val) as Val
FROM dt1
LEFT JOIN lookup on dt1.A>=lookup.minA
                 and dt1.A<=lookup.maxA
                 and dt1.B>=lookup.minB
GROUP BY ID, A, B

To join all matching entries from lookupto dt1and return the smallest Val.

Update

My solution so far is as follows:

CJ.table<-function(X,Y) setkey(X[,c(k=1,.SD)],k)[Y[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL]

dt1.lookup<- CJ.table(dt1,lookup)[A>=minA & A<=maxA & B>=minB,
                                  list(Val=Val[which.min( Val)]),
                                  by=list(ID,A,B)]
dt1.lookup<-rbind.fill(dt1.lookup, dt1[!ID %in% dt1.lookup$ID])

, . Val.

+4
2

, , , , A B :

Prep = dt1[A >= min(lookup$minA) & A <= max(lookup$maxA) & B >= min(lookup$minB)]

, , Val:

Indices = Prep[,list(min(which(A >= lookup$minA)), 
                     min(which(A <= lookup$maxA)), 
                     min(which(B >= lookup$minB)), A, B),by=ID]

Val , :

Indices[,list(Val=lookup$Val[max(V1,V2,V3)], A, B),by=ID]

, , :

   ID Val     A     B
 1: 19 0.1 10000  2000
 2: 20 0.1 20000  2000
 3: 21 0.2 30000  2000
 4: 22 0.3 40000  2000
 5: 23 0.3 50000  2000
 6: 24 0.3 60000  2000
 7: 25 0.3 70000  2000
 8: 26 0.5 80000  2000
 9: 27 0.5 90000  2000
10: 28 0.1 10000  3000
+1

, Senor O. , (Val) , . , , , , .

dt1[,Val:=as.numeric(NA)]
for (row in 1:NROW(lookup)) {
  dt1[A>=lookup[order(Val)][row,minA]&A<=lookup[order(Val)][row,maxA]&B>=lookup[order(Val)][row,minB]&is.na(Val),Val:=lookup[order(Val)][row,Val]]
  }

, , NA .

Val, .

dt1, NA Val, lookup Val , min(Val), .

rbind.fill

rbindlist(list(dt1.lookup,dt1[!ID %in% dt1.lookup[,ID]][,list(ID, A, B, Val=as.numeric(NA))]))

reshape, , .

+1

Source: https://habr.com/ru/post/1525449/


All Articles