Search for the number of elements in one vector that are less than an element in another vector

Say we have a pair of vectors

a <- c(1, 2, 2, 4, 7)
b <- c(1, 2, 3, 5, 7)

For each element b[i]in bI want to find the number of elements in aless b[i]or, equivalently, I want to know b_i's rank c(b[i], a).

There are a couple of naive ways that I can think of, for example. do one of the following length(b)times:

min_rank(c(b[i], a))
sum(a < b[i])

What is the best way to do this if length(a)= length(b)= N, where N is large?

EDIT:

To clarify, I wonder if there is a more computationally efficient way to do this, that is, if I can do better than quadratic time in this case.

Vectorization is always cool though;), thanks @Henrik!

lead time

a <- rpois(100000, 20)
b <- rpois(100000, 10)

system.time(
  result1 <- sapply(b, function(x) sum(a < x))
)
# user  system elapsed 
# 71.15    0.00   71.16

sw <- proc.time()
  bu <- sort(unique(b))
  ab <- sort(c(a, bu))
  ind <- match(bu, ab)
  nbelow <- ind - 1:length(bu)
  result2 <- sapply(b, function(x) nbelow[match(x, bu)])
proc.time() - sw

# user  system elapsed 
# 0.46    0.00    0.48 

sw <- proc.time()
  a1 <- sort(a)
  result3 <- findInterval(b - sqrt(.Machine$double.eps), a1)
proc.time() - sw

# user  system elapsed 
# 0.00    0.00    0.03 

identical(result1, result2) && identical(result2, result3)
# [1] TRUE
+4
3

, a , findInterval:

a <- sort(a)
## gives points less than or equal to b[i]
findInterval(b, a)
# [1] 1 3 3 4 5
## to do strictly less than, subtract a small bit from b
## uses .Machine$double.eps (the smallest distinguishable difference)
findInterval(b - sqrt(.Machine$double.eps), a)
# [1] 0 1 3 4 4
+5

N, b, :

bu <- sort(unique(b))
ab <- sort(c(a, bu))
ind <- match(bu, ab)
nbelow <- ind - 1:length(bu)

a b ab, match , b b, b . , - , match , . nbelow b s

+3

, " ", . sapply () function b.

 sapply(b, function(x) sum(a < x))
 # [1] 0 1 3 4 4
+2
source

Source: https://habr.com/ru/post/1535594/


All Articles