Say we have a pair of vectors
a <- c(1, 2, 2, 4, 7)
b <- c(1, 2, 3, 5, 7)
For each element b[i]in bI want to find the number of elements in aless b[i]or, equivalently, I want to know b_i's rank c(b[i], a).
There are a couple of naive ways that I can think of, for example. do one of the following length(b)times:
min_rank(c(b[i], a))
sum(a < b[i])
What is the best way to do this if length(a)= length(b)= N, where N is large?
EDIT:
To clarify, I wonder if there is a more computationally efficient way to do this, that is, if I can do better than quadratic time in this case.
Vectorization is always cool though;), thanks @Henrik!
lead time
a <- rpois(100000, 20)
b <- rpois(100000, 10)
system.time(
result1 <- sapply(b, function(x) sum(a < x))
)
sw <- proc.time()
bu <- sort(unique(b))
ab <- sort(c(a, bu))
ind <- match(bu, ab)
nbelow <- ind - 1:length(bu)
result2 <- sapply(b, function(x) nbelow[match(x, bu)])
proc.time() - sw
sw <- proc.time()
a1 <- sort(a)
result3 <- findInterval(b - sqrt(.Machine$double.eps), a1)
proc.time() - sw
identical(result1, result2) && identical(result2, result3)