Another option with adist()
:
s <- c("tesr", "teqr", "toar")
s[adist("test", s) < 3]
Or using stringdist
library(stringdist)
s[stringdist("test", s, method = "lv") < 3]
What gives:
#[1] "tesr" "teqr"
Benchmark
x <- rep(s, 10e5)
library(microbenchmark)
mbm <- microbenchmark(
levenshteinDist = x[which(levenshteinDist("test", x) < 3)],
adist = x[adist("test", x) < 3],
stringdist = x[stringdist("test", x, method = "lv") < 3],
times = 10
)
What gives:
Unit: milliseconds
expr min lq mean median uq max neval cld
levenshteinDist 840.7897 1255.1183 1406.8887 1398.4502 1510.5398 1960.4730 10 b
adist 2760.7677 2905.5958 2993.9021 2986.1997 3038.7692 3472.7767 10 c
stringdist 145.8252 155.3228 210.4206 174.5924 294.8686 355.1552 10 a
source
share