A regular expression to match a phrase that accepts a minimum number of characters

I would like a regular expression that, starting from the beginning of the text, matches the word. If an exact word is entered, it matches, but will also match a certain minimum number of matching characters , provided that any additional characters also match .

For example, if I try to match "San Francisco", but I am ready to accept the first five characters so that they can be uniquely identified in the domain:

  • Match: San Francisco
  • Match: San F
  • Match: San Fra
  • Match: San Francisco Blah Blah.
  • Crash: Boston
  • Crash: San Diego
  • Error: San Fransisko
  • Crash: San Franco

This almost works, but doesn't match the last two correctly:

^San Fr?a?n?c?i?s?c?o? 

I use .NET regular expressions, but a solution in any language will do.

+4
source share
4 answers

The problem you are facing belongs to the group.

 ^San F(r(a(n(c(i(s(c(o)?)?)?)?)?)?)?)? 

The brackets are what is allowed "a", depends on the previous "g", etc. It will still match San Francisco and San Francisco, but the matches will only be San Fran, similar to your San Francisco Cobble case.

+3
source

Do I need to be a regular expression? This is much easier to do with simple string comparison.

 bool matches(string input, string phrase, int minimumLength) { int compareLength = Math.Min(input.Length, phrase.Length); return input.Length >= minimumLength && input.Substring(0, compareLength ) == phrase.Substring(0, compareLength ); } 

If it should be a regular expression, then ...

 "^San F(r(a(n(c(i(s(c(o.*)?)?)?)?)?)?)?)?$" 
+4
source

If it should be RegEx, this will work:

 (^San Fr)(ancisco.*|ancisc|ancis|anci|anc|an|a)?\b 

Where

x | y - matches either x or y. For example, "z | wood" matches "z" or "tree." β€œ(z | w) oo” corresponds to β€œzoo” or β€œtree”.

\ b - corresponds to the word boundary, i.e. position between word and space. For example, "er \ b" matches "er" in "never", but not "er" in "verb".

This will make the match the whole phrase - if there is a match. And there are no partial matches for things like San Frano .

You can play with the above example in Regexr :

0
source

Perhaps what you need here is not just Regex, but a method for calculating the distance or even the similarity of two given lines?

If so, look at the Levenshtein algorithm to calculate the distance between the lines.

Does it help?

0
source

Source: https://habr.com/ru/post/1395793/


All Articles