R - The fastest way to find the closest value in a vector

I have two integer / positive vectors:

a <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) #has > 2 mil elements
b <- c(4,6,10,16) # 200000 elements

Now, my resulting vector c should contain for each element of a a nearest element b:

c <- c(4,4,4,4,4,6,6,...)

I tried it with apply and which.min (abs (ab)), but very slowly.

Is there an even smarter way to solve this problem? Is there a solution to data.table?

+6
source share
3 answers
library(data.table)

a=data.table(Value=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))

a[,merge:=Value]

b=data.table(Value=c(4,6,10,16))

b[,merge:=Value]

setkeyv(a,c('merge'))

setkeyv(b,c('merge'))

Merge_a_b=a[b,roll='nearest']
+4
source

Not quite sure how it will behave with your volume, but it cutruns pretty quickly.

The idea is to cut out the vector ain the media between the elements b.

Please note that I assume that the elements in bstrictly increase!

- :

a <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) #has > 2 mil elements
b <- c(4,6,10,16) # 200000 elements

cuts <- c(-Inf, b[-1]-diff(b)/2, Inf)
# Will yield: c(-Inf, 5, 8, 13, Inf)

cut(a, breaks=cuts, labels=b)
# [1] 4  4  4  4  4  6  6  6  10 10 10 10 10 16 16
# Levels: 4 6 10 16

, , ​​ findInterval (, , , ).

findInterval(a, cuts)
[1] 1 1 1 1 2 2 2 3 3 3 3 3 4 4 4

, - :

index = findInterval(a, cuts)
b[index]
# [1]  4  4  4  4  6  6  6 10 10 10 10 10 16 16 16

, , a, b, cut ( findInterval), . .

+2

As shown in this link, you can do:

which(abs(x - your.number) == min(abs(x - your.number)))

or

which.min(abs(x - your.number))

where xis your vector, and your.numberis the value

+2
source

Source: https://habr.com/ru/post/1675018/


All Articles