The solutions proposed so far include creating a logical(length(vec)) and performing a full or partial scan. As you noticed, the vector is sorted. We can use this by doing a binary search. I began to think that I would be super-smart and implement it in C even faster, but I had problems debugging the indexing of the algorithm (which is the difficult part!). So I wrote this in R:
f3 <- function(x) { imin <- 1L imax <- length(x) while (imax >= imin) { imid <- as.integer(imin + (imax - imin) / 2) if (x[imid] >= 0) imax <- imid - 1L else imin <- imid + 1L } imax }
For comparison with other offers
f0 <- function(v) length(which(v < 0)) f1 <- function(v) sum(v < 0) f2 <- function(v) which.min(v < 0) - 1L
and for pleasure
library(compiler) f3.c <- cmpfun(f3)
Going to
> vec <- c(seq(-100,-1,length.out=1e6), rep(0,20), seq(1,100,length.out=1e6)) > identical(f0(vec), f1(vec)) [1] TRUE > identical(f0(vec), f2(vec)) [1] TRUE > identical(f0(vec), f3(vec)) [1] TRUE > identical(f0(vec), f3.c(vec)) [1] TRUE > microbenchmark(f0(vec), f1(vec), f2(vec), f3(vec), f3.c(vec)) Unit: microseconds expr min lq median uq max neval f0(vec) 15274.275 15347.870 15406.1430 15605.8470 19890.903 100 f1(vec) 15513.807 15575.229 15651.2970 17064.8830 18326.293 100 f2(vec) 21473.814 21558.989 21679.3210 22733.1710 27435.889 100 f3(vec) 51.715 56.050 75.4495 78.5295 100.730 100 f3.c(vec) 11.612 17.147 28.5570 31.3160 49.781 100
There are probably a few difficult cases where I was wrong! Moving to C, I did
library(inline) f4 <- cfunction(c(x = "numeric"), " int imin = 0, imax = Rf_length(x) - 1, imid; while (imax >= imin) { imid = imin + (imax - imin) / 2; if (REAL(x)[imid] >= 0) imax = imid - 1; else imin = imid + 1; } return ScalarInteger(imax + 1); ")
with
> identical(f3(vec), f4(vec)) [1] TRUE > microbenchmark(f3(vec), f3.c(vec), f4(vec)) Unit: nanoseconds expr min lq median uq max neval f3(vec) 52096 53192.0 54918.5 55539.0 69491 100 f3.c(vec) 10924 12233.5 12869.0 13410.0 20038 100 f4(vec) 553 796.0 893.5 1004.5 2908 100
findInterval appeared when a similar question was asked in R-help . It is slow but safe, checking that the vec actually sorted and dealing with NA values. If someone wants to live on the edge (perhaps no worse than the f3 or f4 implementation), then
f5.i <- function(v) .Internal(findInterval(v, 0 - .Machine$double.neg.eps, FALSE, FALSE))
almost as fast as the C implementation, but probably more robust and vectorized (i.e., finding the vector of values ββin the second argument for simplified calculations in the form of a range).