Search for the number of elements in one vector that are less than an element in another vector

Question

Search for the number of elements in one vector that are less than an element in another vector

Say we have a pair of vectors

a <- c(1, 2, 2, 4, 7)
b <- c(1, 2, 3, 5, 7)

For each element b[i]in bI want to find the number of elements in aless b[i]or, equivalently, I want to know b_i's rank c(b[i], a).

There are a couple of naive ways that I can think of, for example. do one of the following length(b)times:

min_rank(c(b[i], a))
sum(a < b[i])

What is the best way to do this if length(a)= length(b)= N, where N is large?

EDIT:

To clarify, I wonder if there is a more computationally efficient way to do this, that is, if I can do better than quadratic time in this case.

Vectorization is always cool though;), thanks @Henrik!

lead time

a <- rpois(100000, 20)
b <- rpois(100000, 10)

system.time(
  result1 <- sapply(b, function(x) sum(a < x))
)
# user  system elapsed 
# 71.15    0.00   71.16

sw <- proc.time()
  bu <- sort(unique(b))
  ab <- sort(c(a, bu))
  ind <- match(bu, ab)
  nbelow <- ind - 1:length(bu)
  result2 <- sapply(b, function(x) nbelow[match(x, bu)])
proc.time() - sw

# user  system elapsed 
# 0.46    0.00    0.48 

sw <- proc.time()
  a1 <- sort(a)
  result3 <- findInterval(b - sqrt(.Machine$double.eps), a1)
proc.time() - sw

# user  system elapsed 
# 0.00    0.00    0.03 

identical(result1, result2) && identical(result2, result3)
# [1] TRUE

+4

sorting time-complexity vector r ranking

kevinykuo 08 . '14 16:18

3

N, b, :

bu <- sort(unique(b))
ab <- sort(c(a, bu))
ind <- match(bu, ab)
nbelow <- ind - 1:length(bu)

a b ab, match , b b, b . , - , match , . nbelow b s

+3

Gavin Kelly 08 . '14 16:48

, " ", . sapply () function b.

 sapply(b, function(x) sum(a < x))
 # [1] 0 1 3 4 4

+2

Henrik 08 . '14 16:24

source share

Blue Magister · Accepted Answer · 2014-04-08T16:47:16+0000

, a , findInterval:

a <- sort(a)
## gives points less than or equal to b[i]
findInterval(b, a)
# [1] 1 3 3 4 5
## to do strictly less than, subtract a small bit from b
## uses .Machine$double.eps (the smallest distinguishable difference)
findInterval(b - sqrt(.Machine$double.eps), a)
# [1] 0 1 3 4 4

Search for the number of elements in one vector that are less than an element in another vector

More articles: