How to use RegEx to select the longest match?

I tried to find the answer to this question, but simply could not find anything, and I hope there is a simple solution for this. I have and using the following code in C #,

String pattern = ("(hello|hello world)"); Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); var matches = regex.Matches("hello world"); 

The question is, is there a way for the match method to return the longest template first? In this case, I want to receive “hello world” as my match, and not just “hello”. This is just an example, but my list of templates consists of a decent amount of words in it.

+6
source share
3 answers

If you already know the length of words in advance, then first set the longest. For instance:

 String pattern = ("(hello world|hello)"); 

The longest will match the first. If you do not know the length in advance, this is not possible.

An alternative approach would be to save all matches in an array / hash list and select the longest manually using the built-in language features.

+5
source

Regular expressions (will try) match patterns from left to right. If you want to make sure you get the longest match first, you will need to reorder your templates. First, the leftmost pattern is checked. If a match is found against this pattern, the regex engine will try to match the rest of the pattern with the rest of the string; The following pattern will only be checked if no match is found.

 String pattern = ("(hello world|hello wor|hello)"); 
+1
source

Make two different regular expressions. The first will match your longer option, and if that doesn't work, the second will match your shorter option.

 string input = "hello world"; string patternFull = "hello world"; Regex regexFull = new Regex(patternFull, RegexOptions.IgnoreCase); var matches = regexFull.Matches(input); if (matches.Count == 0) { string patternShort = "hello"; Regex regexShort = new Regex(patternShort, RegexOptions.IgnoreCase); matches = regexShort.Matches(input); } 

In the end, matches will be printed “full” or “short”, but first “full” will be checked and there will be a short circuit if it is true.

You can wrap logic in a function if you plan to name it many times. This is what I came up with (but there are many other ways to do this).

 public bool HasRegexMatchInOrder(string input, params string[] patterns) { foreach (var pattern in patterns) { Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); if (regex.IsMatch(input)) { return true; } } return false; } string input = "hello world"; bool hasAMatch = HasRegexMatchInOrder(input, "hello world", "hello", ...); 
0
source

Source: https://habr.com/ru/post/970969/


All Articles