How to evaluate the regular expression operator OR

In T-SQL, I generated UNIQUEIDENTIFIER using NEWID () . For instance:

723952A7-96C6-421F-961F-80E66A4F29D2 

Then all dashes ( - ) are deleted and looks like this:

 723952A796C6421F961F80E66A4F29D2 

Now I need to turn the line above into a valid UNIQUEIDENTIFIER using the following format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx and set the dash again.

To do this, I use the SQL CLR implementation of the C# RegexMatches with this regular expression ^.{8}|.{12}$|.{4} , which gives me the following:

 SELECT * FROM [dbo].[RegexMatches] ('723952A796C6421F961F80E66A4F29D2', '^.{8}|.{12}$|.{4}') 

enter image description here

Using the above, I can easily build the correct UNIQUEIDENTIFIER , but I wonder how the OR operator in the regular expression is computed. For example, the following will not work:

 SELECT * FROM [dbo].[RegexMatches] ('723952A796C6421F961F80E66A4F29D2', '^.{8}|.{4}|.{12}$') 

enter image description here

I’m not sure if the first regular expression will first match the beginning and end of the line, and then to other values ​​and always return matches in that order (I will have problems if, for example, 96C6 after 421F ).

+6
source share
2 answers

If you are interested in what happens when you use | interlacing operator , the answer is simple: the regex engine processes the expression from left to right.

Taking the sample that you have, ^.{8}|.{12}$|.{4} starts checking the input line on the left and checks for ^.{8} - the first 8 characters. Finds them, and this is a match. Then, go ahead and find the last 12 characters with .{12}$ , and a match will appear again. Then any 4-character strings are matched.

Regular expression visualization

Demo version of Debuggex

Then you have ^.{8}|.{4}|.{12}$ . The expression is parsed again from left to right, the first 8 characters will be matched first , but , only 4-character sequences will be matched,. .{12} will never work, because it will .{4} matches!

Regular expression visualization

Demo version of Debuggex

+3
source

Your Regex ^.{8}|.{12}$|.{4} evaluates to:

Starting with any character except \ n {exactly 8 times}

OR any character except \ n {exactly 12 times}

OR any character except \ n {exactly 4 times} globally

This means that everything after four characters in a string will match, because somewhere in a string of> 4 characters there are 4 characters in a string.

1 [false]

12 [false]

123 [false]

1234 [true]

12345 [true]

123456 [true]

1234567 [true]

12345678 [true]

123456789 [true]

1234567890 [true]

12345678901 [true]

123456789012 [true]

You can search for:

^.{8}$|^.{12}$|^.{4}$

What gives you:

1 [false]

12 [false]

123 [false]

1234 [true]

12345 [false]

123456 [false]

1234567 [false]

12345678 [true]

123456789 [false]

1234567890 [false]

12345678901 [false]

123456789012 [true]

+3
source

Source: https://habr.com/ru/post/988168/


All Articles