Workaround for SQL Server regex in T-SQL?

I have SQLCLR code to work with Regular Expresions. But now that he is switching to Azure, which does not allow SQLCLR, this is not possible. I need to find a way to make regular expression in pure T-SQL.

Master data services are not available because the dev version for MSSQL is not R2.

Any ideas appreciated, thanks.

Regular matches with samples that require processing (cut from regexlib and elsewhere over the past few years)

E-mail address

^[\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[az]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?$ 

dollars

 ^(\$)?(([1-9]\d{0,2}(\,\d{3})*)|([1-9]\d*)|(0))(\.\d{2})?$ 

URI

 ^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$ 

one digital digit

 ^\d$ 

Percent

 ^-?[0-9]{0,2}(\.[0-9]{1,2})?$|^-?(100)(\.[0]{1,2})?$ 

designation of height

 ^\d?\d'(\d|1[01])"$ 

between 1 1000

 ^([1-9]|[1-9]\d|1000)$ 

credit card numbers

 ^((4\d{3})|(5[1-5]\d{2})|(6011))-?\d{4}-?\d{4}-?\d{4}|3[4,7]\d{13}$ 

list of years

 ^([1-9]{1}[0-9]{3}[,]?)*([1-9]{1}[0-9]{3})$ 

days of the week

 ^(Sun|Mon|(T(ues|hurs))|Fri)(day|\.)?$|Wed(\.|nesday)?$|Sat(\.|urday)?$|T((ue?)|(hu?r?))\.?$ 

12 hour time

 (?<Time>^(?:0?[1-9]:[0-5]|1(?=[012])\d:[0-5])\d(?:[ap]m)?) 

24-hour time

 ^(?:(?:(?:0?[13578]|1[02])(\/|-|\.)31)\1|(?:(?:0?[13-9]|1[0-2])(\/|-|\.)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/|-|\.)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/|-|\.)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$ 

usa phone numbers

 ^\(?[\d]{3}\)?[\s-]?[\d]{3}[\s-]?[\d]{4}$ 
+4
source share
2 answers

Unfortunately, you cannot migrate your CLR functions to SQL Azure. You will need to either use the usual string functions (PATINDEX, CHARINDEX, LIKE, etc.), or perform these operations outside the database.

EDIT Adding some information for examples added to the question.

E-mail address

This is always controversial because people do not agree on which version of the RFC they want to support. For example, the original did not support apostrophes (or, at least, people insisted that this did not happen - I didnโ€™t dig it out of the archives and didnโ€™t read it myself, however), and it needs to be expanded quite often for new TLDs (once for four-letter TLDs such as .info, then again for 6-letter TLDs such as .museum). I have often heard that knowledgeable people claim that perfect email authentication is not possible, and having previously worked with an email service provider, I can tell you that this is a constantly moving goal. But for the simplest approaches, see Question TSQL Email Validation (without regex) .

One digital digit

Probably the easiest one:

 WHERE @s LIKE '[0-9]'; 

Credit card numbers

Assuming you cross out dashes and spaces that you should do anyway. Please note that this is not an actual check of the credit card number algorithm to make sure that the number itself is valid, only that it matches the general format (AmEx = 15 digits, starting from 3, the rest - 16 digits - Visa starts from 4, MasterCard starts with 5, Discover starts with 6, and I think there is one that starts with 7 (although it can only be gift cards)):

 WHERE @s + ' ' LIKE '[3-7]'+ REPLICATE('[0-9]', 14) + '[0-9 ]'; 

If you want to be a little more accurate at the cost of a drag, you can say:

 WHERE (LEN(@s) = 15 AND @s LIKE '3' + REPLICATE('[0-9]', 14)) OR (LEN(@s) = 16 AND @s LIKE '[4-7]' + REPLICATE('[0-9]', 15)); 

US Phone Numbers

Again, assuming you are going to first cut out parentheses, dashes, and spaces. It is pretty accurate that the US area code cannot begin with 1; if there are other rules, I do not know about them.

 WHERE @s LIKE '[2-9]' + REPLICATE('[0-9]', 9); 

-----

I am not going to go further, because many other expressions that you have defined can be extrapolated from the above. Hope this gives you a start. You need to be able to Google for some others to see how other people have replicated templates using T-SQL. Some of them (for example, days of the week) can probably just be checked on the table - it seems unnecessary to match the infestation patterns for a set of seven possible values. Similar to a list of 1000 numbers or years, it will be much easier (and probably more efficient) to check if there is a numerical value in the table, and not convert it to a string and see if it matches some kind of pattern.

I reiterate that much will be much better if you can clear and verify the data before it gets to the database in the first place. You should strive to do this wherever possible, because without the CLR, you simply cannot make a powerful RegEx inside SQL Server.

+4
source

Ken Henderson wrote about ways to replicate RegEx without the CLR , but they need sp_OA * procedures, which are even less likely to ever see the light of day in Lazur than the CLR. Most of the other articles you find on the Internet use an approach similar to Ken, or use the sophisticated use of inline string functions.

What parts of RegEx are you trying to replicate? Can you show an example of input / output of one of your functions? Perhaps it will be easy to convert to get similar results using built-in string functions such as PATINDEX.

+3
source

Source: https://habr.com/ru/post/1369041/


All Articles