I have a large dataset and lookup table. I need to return for each row in the dataset the smallest value for the rows in the search where the conditions are met.
Given the size of my dataset, I am reluctant to crack the iffy solution together, cross-connecting, as this would create many millions of records. I hope someone can suggest a solution that (ideally) uses the r or data.table base, as they are already being used in an efficient way.
Example
A<-seq(1e4,9e4,1e4)
B<-seq(0,1e4,1e3)
dt1<-data.table(expand.grid(A,B),ID=1:nrow(expand.grid(A,B)))
setnames(dt1, c("Var1","Var2"),c("A","B"))
lookup<-data.table(minA=c(1e4,1e4,2e4,2e4,5e4),
maxA=c(2e4,3e4,7e4,6e4,9e4),
minB=rep(2e3,5),
Val=seq(.1,.5,.1))
A B ID Val
99: 90000 10000 99 0.5
In SQL, I would then write something line by line
SELECT ID, A, B, min(Val) as Val
FROM dt1
LEFT JOIN lookup on dt1.A>=lookup.minA
and dt1.A<=lookup.maxA
and dt1.B>=lookup.minB
GROUP BY ID, A, B
To join all matching entries from lookupto dt1and return the smallest Val.
Update
My solution so far is as follows:
CJ.table<-function(X,Y) setkey(X[,c(k=1,.SD)],k)[Y[,c(k=1,.SD)],allow.cartesian=TRUE][,k:=NULL]
dt1.lookup<- CJ.table(dt1,lookup)[A>=minA & A<=maxA & B>=minB,
list(Val=Val[which.min( Val)]),
by=list(ID,A,B)]
dt1.lookup<-rbind.fill(dt1.lookup, dt1[!ID %in% dt1.lookup$ID])
, . Val.