I am wondering if there is a way to make fuzzy string matching in PHP. Look for a word in a long line, finding a potential match, even if it is spelled incorrectly; what could have found it if it had been disabled by a single character due to an OCR error.
I thought a regex generator could do this. Therefore, given the input of "crazy", it will generate this regular expression:
.*((crazy)|(.+razy)|(c.+azy)|cr.+zy)|(cra.+y)|(craz.+)).*
Then it will return all matches for the word or variations of the word.
How to create a generator:
I would probably split the search string / word up into an array of characters and build a regular expression expression, making foreach a newly created array, replacing the key value (letter position in the string) with ". +".
Is this a good way to do a fuzzy text search, or is there a better way? What about some string comparison that gives me an estimate based on how close it is? I am trying to see if some poorly transformed OCR text contains a short word.
source
share