How to match an entire string as one of two formats with one regular expression?

I need to check the values ​​that one of the two formats can have, and I try to do this with one regular expression, but I can not understand why it does not work.

The first format is exactly 17 alphanumeric characters, and the expression ^[A-Za-z0-9]{17}$ correctly matches the test value 5UXWX7C56BA123456 , but not the shortened value 5UXWX7C56BA12345 or the extended value 5UXWX7C56BA1234569 .

The second format is exactly 8 alphanumeric characters, followed by an asterisk or underscore, and two more alphanumeric characters. The expression ^[A-Za-z0-9]{8}[*_][A-Za-z0-9]{2}$ correctly matches the test value 5UXWX7C5*BA , but not the shortened value 5UXWX7C5*B or the extended value 5UXWX7C5*BA1 .

However, when I try to combine expressions, I get unexpected results that differ, depending on which of the subexpressions I put first. The following code snippet demonstrates

 var pattern1 = new Regex(@"^([A-Za-z0-9]{17})|([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})$"); var pattern2 = new Regex(@"^([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})|([A-Za-z0-9]{17})$"); var values = new string[] { "5UXWX7C56BA12345", "5UXWX7C56BA123456", "5UXWX7C56BA1234569", "5UXWX7C5*B", "5UXWX7C5*BA", "5UXWX7C5*BA1" }; Console.WriteLine($"Using {pattern1}\n"); Console.WriteLine($" {"Value",-20}{"IsMatch",-9}{"Expected",-10}"); Console.WriteLine($" {new string('-', 37)}"); values .Select(x => new { Value = x, Result = pattern1.IsMatch(x), ExpectedResult = x.Length == 11 || x.Length == 17 }) .Select(x => $" {x.Value,-20}{x.Result,-9}{x.ExpectedResult} {(x.Result == x.ExpectedResult ? "" : "UNEXPECTED")}") .WithEach(Console.WriteLine); Console.WriteLine($"\n\nUsing {pattern2}\n"); Console.WriteLine($" {"Value",-20}{"IsMatch",-9}{"Expected",-10}"); Console.WriteLine($" {new string('-', 37)}"); values .Select(x => new { Value = x, Result = pattern2.IsMatch(x), ExpectedResult = x.Length == 11 || x.Length == 17 }) .Select(x => $" {x.Value,-20}{x.Result,-9}{x.ExpectedResult} {(x.Result == x.ExpectedResult ? "" : "UNEXPECTED")}") .WithEach(Console.WriteLine); 

getting the following results:

 Using ^([A-Za-z0-9]{17})|([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})$ Value IsMatch Expected ------------------------------------- 5UXWX7C56BA12345 False False 5UXWX7C56BA123456 True True 5UXWX7C56BA1234569 True False UNEXPECTED 5UXWX7C5*B False False 5UXWX7C5*BA True True 5UXWX7C5*BA1 False False Using ^([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})|([A-Za-z0-9]{17})$ Value IsMatch Expected ------------------------------------- 5UXWX7C56BA12345 False False 5UXWX7C56BA123456 True True 5UXWX7C56BA1234569 True False UNEXPECTED 5UXWX7C5*B False False 5UXWX7C5*BA True True 5UXWX7C5*BA1 True False UNEXPECTED 

I hope someone can point out the error in my expressions. It seems that although I use ^ and $ to try to get the whole string / value to be matched, one way or another, when a match is found, even if there is another unmatched character that I would expect to call the whole value not to match .

Although I used LINQPad to run the snippet above, I see the same results from regex101.com .

+5
source share
1 answer

Your regular expressions are not bound correctly:

 ^([A-Za-z0-9]{17})|([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})$ ^ ^ ^ ^ 

Here ([A-Za-z0-9]{17}) attached only to the beginning of the line (and after this pattern there can be anything), and ([A-Za-z0-9]{8}[*_][A-Za-z0-9]{2}) bound only at the end of the line (and there may be something before this pattern).

The same goes for the second template, you just replaced the alternatives.

Using

 var pattern1 = new Regex(@"^(?:[A-Za-z0-9]{17}|[A-Za-z0-9]{8}[*_][A-Za-z0-9]{2})$"); ^ ^ ^ 

Otherwise, your alternatives are not tied to both sides.

See the demo of regex .

+3
source

Source: https://habr.com/ru/post/1275624/


All Articles