EDIT: This error was detected in 32-bit versions of R; it was fixed in version 2.9.2 R.
It was written for me by @leoniedu today, and I have no answer for it, so I thought I'd post it here.
I read the documentation for agrep () (fuzzy string matching) and it seems like I don't fully understand the max.distance parameter. Here is an example:
pattern <- "Staatssekretar im Bundeskanzleramt"
x <- "Bundeskanzleramt"
agrep(pattern,x,max.distance=18)
agrep(pattern,x,max.distance=19)
It behaves exactly as I expected. There are 18 characters between lines, so I expect this to be a match threshold. Here's what bothers me:
agrep(pattern,x,max.distance=30)
agrep(pattern,x,max.distance=31)
agrep(pattern,x,max.distance=32)
agrep(pattern,x,max.distance=33)
Why are 30 and 33 coincidences, but not 31 and 32? To save you a bill,
> nchar("Staatssekretar im Bundeskanzleramt")
[1] 34
> nchar("Bundeskanzleramt")
[1] 16
source
share