How do you get the name of a regex group?

I had a regex like:

(?<one-1>cat)|(?<two-2>dog)|(?<three-3>mouse)|(?<four-4>fish) 

When I tried to use this template in a .Net application, it failed because the group name contained a "-" in it.

So, as a workaround, I tried using two regular expressions, the first one:

 (?<A>cat)|(?<Be>dog)|(?<C>mouse)|(?<D>fish) 

will match the original cases that I was looking for in the names of groups that I could control.
And then I intended to use the right group name from this regular expression in the following:

 (?<A>one-1)|(?<Be>two-2)|(?<C>three-3)|(?<D>four-4) 

I would do this by finding a string that matches this pattern and determine if the group names were equal.

I know this seems a bit confusing. Thanks for any help.

+4
source share
4 answers

Something in the lines of the following?

 string[,] patterns = { { "one-1", "cat" }, { "two-2", "dog" }, { "three-3", "mouse" }, { "four-4", "fish" }, }; var regex = buildRegex(patterns); string[] tests = { "foo", "dog", "bar", "fish" }; foreach (var t in tests) { var m = regex.Match(t); Console.WriteLine("{0}: {1}", t, reportMatch(regex, m)); } 

Exit

  foo: no match
 dog: two-2 = dog
 bar: no match
 fish: four-4 = fish 

First, we create an instance of Regex by escaping group names and combining them with templates. Any non-word character is replaced with the sequence _nnn_ , where nnn is its UTF-32 value.

 private static Regex buildRegex(string[,] inputs) { string regex = ""; for (int i = 0; i <= inputs.GetUpperBound(0); i++) { var part = String.Format( "(?<{0}>{1})", Regex.Replace(inputs[i,0], @"([\W_])", new MatchEvaluator(escape)), inputs[i,1]); regex += (regex.Length != 0 ? "|" : "") + part; } return new Regex(regex); } private static string escape(Match m) { return "_" + Char.ConvertToUtf32(m.Groups[1].Value, 0) + "_"; } 

For coincidences, the .NET library does not give us an easy way to get the name of a group, so we need to go a different way: for each group name we check if this group matches and if its unescape its name and let the caller know both the name and the captured substring.

 private static string reportMatch(Regex regex, Match m) { if (!m.Success) return "no match"; foreach (var name in regex.GetGroupNames()) { if (name != "0" && m.Groups[name].Value.Length > 0) return String.Format( "{0} = {1}", Regex.Replace(name, @"_(\d+)_", new MatchEvaluator(unescape)), m.Groups[name].Value); } return null; } private static string unescape(Match m) { return Char.ConvertFromUtf32(int.Parse(m.Groups[1].Value)); } 
0
source

?<one-1> does not work, because - used in balancing groups:

Deletes the definition of the previously defined name of group2 and saves in the name of group1 the interval between the previously defined group name2 and the current group. If group 2 name is not defined, a match is returned. Since deleting the last definition of name2 shows the previous definition of name2, this construct allows you to use the capture stack for the group name2 as a counter to track nested constructs such as parentheses. In this construct, name1 is optional. You can use single quotes instead of angle brackets; e.g. (? 'name1-name2').

You cannot escape this minus sign, so you must use a different delimiter.

+3
source

Try using underscores instead of dashes. When I changed the original regex to:

 (?<one_1>cat)|(?<two_2>dog)|(?<three_3>mouse)|(?<four_4>fish) 

I managed to use Groups ["one_1"]. Value to get a consistent group.

EDIT : Example:

 string pattern = "(?<one_1>cat)|(?<two_2>dog)|(?<three_3>mouse)|(?<four_4>fish)"; string[] inputs = new[]{"cat", "horse", "dog", "dolphin", "mouse", "hamster", "fish"}; string[] groups = new[]{"one_1", "two_2", "three_3", "four_4"}; foreach(string input in inputs) { Match oMatch = Regex.Match(input, pattern, RegexOptions.IgnoreCase); Console.WriteLine("For input: {0}", input); foreach(string group in groups) { Console.WriteLine("Group {0}:\t{1}", group, oMatch.Groups[group].Value); } Console.WriteLine("----------"); } 

Using a dash, as it was at the beginning, will cause it to not find the name of the group. I assume that it uses the same variable naming conventions as the rest of .NET, so if you cannot use it as the name of a legal variable, do not use it as the name of a group.

+1
source

I don't understand what your end result is, but the following will map the value to the names of the source groups. From there, you can determine how to proceed.

Try:

 var map = new Dictionary<string, string>() { {"A", "one-1"}, {"B", "two-2"}, {"C", "three-3"}, {"D", "four-4"} }; string[] inputs = { "cat", "dog", "mouse", "fish", "bird" }; string pattern = "(?<A>cat)|(?<B>dog)|(?<C>mouse)|(?<D>fish)"; Regex rx = new Regex(pattern); foreach (string input in inputs) { Match m = rx.Match(input); if (m.Success) { string groupName = rx.GetGroupNames() .Where(g => g != "0" && m.Groups[g].Value != "") .Single(); Console.WriteLine("Match: {0} -- Group name: {1} -- Corresponds to: {2}", input, groupName, map[groupName]); } else { Console.WriteLine("Failed: {0}", input); } } 

The Regex.GetGroupNames method provides an easy way to extract group names from a template. If you refer to a group value that does not match it, it returns an empty string. The idea behind this approach is to pass (LINQ through) each group name and check if there is a match when ignoring the default group "0". If this is appropriate, then this is the group we are in.

0
source

Source: https://habr.com/ru/post/1300073/


All Articles