Levenshtein distance in regular expression

Question

Levenshtein distance in regular expression

Is it possible to include Levenshtein distance in a regex query?

(Except for creating a union between permutations, for example, to search for “hello” with a Levenshtein distance of 1:

.ello | h.llo | he.lo | hel.o | hell.

as this is stupid and unsuitable for long Levenshtein distances.)

+9

regex levenshtein distance

zdenda.online Apr 10 '12 at 9:39

source share

3 answers

You can generate a regular expression programmatically. I will leave this as an exercise for the reader, but to output this hypothetical function (given the input of a “word”) you want something like this line:

 "^(?>word|wodr|wrod|owrd|word.|wor.d|wo.rd|w.ord|.word|wor.?|wo.?d|w.?rd|.?ord)$"

In English, you first try to combine the word with yourself, then on every possible single transposition, then on every possible single insertion, then on every possible single omission or substitution (can be performed simultaneously).

The length of this string given by a word of length n is linear (and especially not exponential) with n.

This is reasonable, I think.

You pass this to the regular expression generator (for example, in Ruby it will be Regexp.new (str)) and bam, you have an assistant for ANY word with a Damerau-Levenshtein distance of 1 from the given word.

(The Damerau-Levenshtein 2 distance is much more difficult.)

Note the use of the construct (?> Non-backtracing, which means the order of the individual expressions in this release.

I could not think of a way to “squeeze” this expression.

EDITOR: I got him a job, at least in Elixir! https://github.com/pmarreck/elixir-snippets/blob/master/damerau_levenshtein_distance_1.exs

I would not recommend this though (except for educational purposes), since it will only lead you to distances of 1; DL's legitimate library will allow you to compute distances> 1. Although, since it is a regular expression, it will probably work pretty quickly after building (note that you should save the “compiled” regular expression somewhere, since this code is currently time restores it with EVERY comparison!)

+7

pmarreck Jul 9 '12 at 22:23

source share

There are a couple of regular expression dialects with an approximate matching function, namely the TRE library and the regex PyPI module for Python.

The approximate TRE mapping syntax is described in the “Rough mapping options” section at https://laurikari.net/tre/documentation/regex-syntax/ . The regular expression TRE for matching things at a Levenshtein 1 distance from hello would be:

 (hello){~1}

The approximate syntax for the regex module is described at https://pypi.org/project/regex/ in the paragraph with a marker that begins with the text Approximate "fuzzy" matching . regex expression for matching things at a Levenshtein 1 distance from hello will be:

 (hello){e<=1}

Perhaps one or the other of these syntaxes will be adopted over time by other regular expression implementations, but at the moment I only know these two.

+1

Mark amery Jul 07 '19 at 18:18

source share

Bart kiers · Accepted Answer · 2012-04-10T09:57:26+0000

is it possible to include levenshtein distance in regex query?

No, not reasonable. Implementing - or using the existing Levenshtein distance algorithm - is the way to go.

Levenshtein distance in regular expression

More articles: