Combine data. Table of two closest variables

Question

Combine data. Table of two closest variables

I have two data tables with x, y coordinates and other information that I would like to combine based on the nearest neighboring distance, i.e. at least squared the differences of both x and y (dx_i = min ([(x_i- x_j) ^ 2 + (y_i-y_j) ^ 2] ^ 0.5). Let's say I have the following two sets:

DT1=data.table(x=1:5,y=3:7) DT2=data.table(x=c(2,4,2,3,6),y=c(2.5,3.1,2,3,5),Q=c('a','b','c','d','e'))

Then the desired result of the merger would be:

  xy Q 1: 1 3 a 2: 2 4 d 3: 3 5 d 4: 4 6 e 5: 5 7 e

I could, of course, write a cycle on DT1 to calculate the nearest neighbor for each row in DT1 and then merge based on this calculation, but this seems to have exceeded the purpose of the data tables. Moreover, it will be very slow for data tables of several million rows.

I know that for one column I could bring the closest neighbor closer, like this

 DT2[DT1,roll="nearest"]

But this (logically) does not work when I define 2 keys (x and y) for joined tables. Is there a similar syntax for merging with two closest neighbor options? If not, is there a smarter way to do this and then just loop around, as I mentioned?

+6

merge r data.table

Michiel Feb 10 '15 at 15:11

source share

1 answer

Colonel beauvel · Answer 1 · 2015-02-10T15:33:13+0000

One possible solution:

 func = function(u,v) { vec = with(DT2, (ux)^2 + (vy)^2) DT2[which.min(vec),]$Q } transform(DT1, Q=apply(DT1, 1, function(u) func(u[1], u[2]))) # xy Q #1: 1 3 a #2: 2 4 d #3: 3 5 d #4: 4 6 e #5: 5 7 e

Combine data. Table of two closest variables

More articles: