Regex template inside SQL Replace function?

SELECT REPLACE('<strong>100</strong><b>.00 GB', '%^(^-?\d*\.{0,1}\d+$)%', ''); 

I want to replace any markup between two parts of a number with the above regular expression, but it doesn't work. I'm not sure if this is regex syntax, which is wrong because I tried a simpler one, like '%[^0-9]%' , just check, but it doesn’t work either. Does anyone know how I can achieve this?

+65
regex sql-server
Jan 27 '14 at 10:16
source share
9 answers

You can use PATINDEX to search for the first occurrence index of a pattern (string). Then use STUFF to fill another line to match the pattern (line).

Scroll through each line. Replace all illegal characters with whatever you want. In your case, replace a non-numeric value with an empty one. An inner loop is if you have more than one illegal character in the current loop cell.

 DECLARE @counter int SET @counter = 0 WHILE(@counter < (SELECT MAX(ID_COLUMN) FROM Table)) BEGIN WHILE 1 = 1 BEGIN DECLARE @RetVal varchar(50) SET @RetVal = (SELECT Column = STUFF(Column, PATINDEX('%[^0-9.]%', Column),1, '') FROM Table WHERE ID_COLUMN = @counter) IF(@RetVal IS NOT NULL) UPDATE Table SET Column = @RetVal WHERE ID_COLUMN = @counter ELSE break END SET @counter = @counter + 1 END 

Attention! This is slow! The presence of a varchar column may be affected. Therefore, using LTRIM RTRIM may help a little. Despite this, it is slow.

Credit goes to this answer on StackOverFlow.

CHANGE Credit is also sent to @srutzky

Edit (by @Tmdean) Instead of doing one line at a time, this answer can be adapted to a more multi-based solution. It still iterates the maximum number of non-numeric characters on one line, so it is not perfect, but I think this should be acceptable in most situations.

 WHILE 1 = 1 BEGIN WITH q AS (SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n FROM Table) UPDATE Table SET Column = STUFF(Column, qn, 1, '') FROM q WHERE Table.ID_Column = q.ID_Column AND qn != 0; IF @@ROWCOUNT = 0 BREAK; END; 

You can also improve efficiency quite a bit if you maintain a column bit in a table that indicates whether the field has been cleared. (NULL represents "Unknown" in my example and should be the default column.)

 DECLARE @done bit = 0; WHILE @done = 0 BEGIN WITH q AS (SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n FROM Table WHERE COALESCE(Scrubbed_Column, 0) = 0) UPDATE Table SET Column = STUFF(Column, qn, 1, ''), Scrubbed_Column = 0 FROM q WHERE Table.ID_Column = q.ID_Column AND qn != 0; IF @@ROWCOUNT = 0 SET @done = 1; -- if Scrubbed_Column is still NULL, then the PATINDEX -- must have given 0 UPDATE table SET Scrubbed_Column = CASE WHEN Scrubbed_Column IS NULL THEN 1 ELSE NULLIF(Scrubbed_Column, 0) END; END; 

If you do not want to change your schema, this is easy to adapt to store intermediate results in a table variable that applies to the actual table at the end.

+54
Apr 11 '14 at 1:23
source share

In a general sense, SQL Server does not support regular expressions, and you cannot use them in your own T-SQL code.

You can write a CLR function for this. For example, here .

+23
Jan 27 '14 at 10:19
source share

Instead of cutting the found character to a single position, using Replace(Column, BadFoundCharacter, '') can be significantly faster. In addition, instead of replacing one bad character found in each column, it replaces all found ones.

 WHILE 1 = 1 BEGIN UPDATE dbo.YourTable SET Column = Replace(Column, Substring(Column, PatIndex('%[^0-9.-]%', Column), 1), '') WHERE Column LIKE '%[^0-9.-]%' If @@RowCount = 0 BREAK; END; 

I am convinced that this will work better than the accepted answer, if only because it does fewer operations. There are other ways that can be faster, but I don’t have time to research them right now.

+18
Jan 13 '16 at 19:51
source share

Here is a recursive function that I wrote for this, based on previous answers.

 CREATE FUNCTION dbo.RecursiveReplace ( @P_String VARCHAR(MAX), @P_Pattern VARCHAR(MAX), @P_ReplaceString VARCHAR(MAX), @P_ReplaceLength INT = 1 ) RETURNS VARCHAR(MAX) BEGIN DECLARE @Index INT; -- Get starting point of pattern SET @Index = PATINDEX(@P_Pattern, @P_String); IF @Index > 0 BEGIN -- Perform the replace SET @P_String = STUFF(@P_String, PATINDEX(@P_Pattern, @P_String), @P_ReplaceLength, @P_ReplaceString); -- Recurse SET @P_String = dbo.RecursiveReplace(@P_String, @P_Pattern, @P_ReplaceString, @P_ReplaceLength); END; RETURN @P_String; END; 

Gist

+3
Dec 17 '17 at 17:05
source share

I stumbled upon this post in search of something else, but thought that I would mention a solution that I use that is much more efficient - and really should be the default implementation for any function when used with a set-based query - that should use cross application. table function. It seems that the topic is still active, so I hope this is useful to someone.

An example of the execution time of some answers at the moment, based on the execution of queries based on a recursive set or scalar function, based on a test set of strings 1 m long that removes characters from random newid, varies from 34 s to 2 m05 s for examples of the WHILE cycle and from 1 m 3 to forever for example functions.

Using the cross-tabular tabular function allows you to achieve the same goal in 10 seconds . You may need to customize it to suit your needs, such as the maximum length that it handles.

Function:

 CREATE FUNCTION [dbo].[RemoveChars](@InputUnit VARCHAR(40)) RETURNS TABLE AS RETURN ( WITH Numbers_prep(Number) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ) ,Numbers(Number) AS ( SELECT TOP (ISNULL(LEN(@InputUnit),0)) row_number() OVER (ORDER BY (SELECT NULL)) FROM Numbers_prep a CROSS JOIN Numbers_prep b ) SELECT OutputUnit FROM ( SELECT substring(@InputUnit,Number,1) FROM Numbers WHERE substring(@InputUnit,Number,1) like '%[0-9]%' ORDER BY Number FOR XML PATH('') ) Sub(OutputUnit) ) 

Using:

 UPDATE t SET column = o.OutputUnit FROM ##tt CROSS APPLY [dbo].[RemoveChars](t.column) o 
+3
Jan 31 '18 at 16:23
source share

Wrapping a solution inside an SQL function can be useful if you want to reuse it. I even do this at the cellular level, so I put this as another answer:

 CREATE FUNCTION [dbo].[fnReplaceInvalidChars] (@string VARCHAR(300)) RETURNS VARCHAR(300) BEGIN DECLARE @str VARCHAR(300) = @string; DECLARE @Pattern VARCHAR (20) = '%[^a-zA-Z0-9]%'; DECLARE @Len INT; SELECT @Len = LEN(@String); WHILE @Len > 0 BEGIN SET @Len = @Len - 1; IF (PATINDEX(@Pattern,@str) > 0) BEGIN SELECT @str = STUFF(@str, PATINDEX(@Pattern,@str),1,''); END ELSE BEGIN BREAK; END END RETURN @str END 
+2
Apr 26 '17 at 21:19
source share

If you do this only for a parameter that is part of the stored procedure, you can use the following:

 while PatIndex('%[^0-9]%', @Param) > 0 select @Param = Replace(@Param, Substring(@Param, PatIndex('%[^0-9]%', @Param), 1), '') 
+1
May 12 '17 at 16:07
source share

I created this function to clear a string containing non-numeric characters in a time field. Time contained question marks when they did not add minutes, something like this 20: ??. The function iterates over each character and replaces? from 0:

  CREATE FUNCTION [dbo].[CleanTime] ( -- Add the parameters for the function here @intime nvarchar(10) ) RETURNS nvarchar(5) AS BEGIN -- Declare the return variable here DECLARE @ResultVar nvarchar(5) DECLARE @char char(1) -- Add the T-SQL statements to compute the return value here DECLARE @i int = 1 WHILE @i <= LEN(@intime) BEGIN SELECT @char = CASE WHEN substring(@intime,@i,1) like '%[0-9:]%' THEN substring(@intime,@i,1) ELSE '0' END SELECT @ResultVar = concat(@ResultVar,@char) set @i = @i + 1 END; -- Return the result of the function RETURN @ResultVar END 
+1
Jan 24 '19 at 14:18
source share

I think a simpler and faster approach is repeated for each character in the alphabet:

 DECLARE @i int SET @i = 0 WHILE(@i < 256) BEGIN IF char(@i) NOT IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.') UPDATE Table SET Column = replace(Column, char(@i), '') SET @i = @i + 1 END 
0
Apr 26 '18 at 15:38
source share



All Articles