I was curious about the speed of string comparisons in R, when it was time to use != Vs == and I wondered how quickly they reduced.
If I have a vector with two levels, one of which is common and the other is rare (trying to multiply my desired effect).
x <- sample(c('ALICE', 'HAL90000000000'), replace = TRUE, 1000, prob = c(0.05,0.95))
I would suggest (if there is a contraction) that the operation
x != 'ALICE'
will be much faster than:
x == 'HAL90000000000'
since to check for equality in the latter case, I would suggest that I need to check each character, while the first will be invalidated by either the first or last character (depending on which side the algorithm is checked)
but when I test this, it seems that it is not (it was inconclusive in repeated trials, although with a very slight bias towards operation == faster ?!), or is this not a fair test:
> microbenchmark(x != 'ALICE', x == 'HAL90000000000') Unit: microseconds expr min lq mean median uq max neval x != "ALICE" 4.520 4.5505 4.61831 4.5775 4.6525 4.970 100 x == "HAL90000000000" 3.766 3.8015 4.00386 3.8425 3.9200 13.766 100
Why is this?
EDIT:
I guess this is because it does a full string match, but if so, is there a way to get R to optimize them? I get no benefit from obfuscating the amount of time it takes to match long or short lines, without worrying about passwords.
source share