I am preparing for the task of extracting data. I need to delete a set of terms; none, some or all may be present on each line of the source record. There are over 100,000 target entries. I want to avoid performing one-time matches / substituting actions, since (a) the list of conditions to be reduced is likely to grow, and (b) the time to complete the current match / substitution action for one term at a time is unacceptable.
My question is: how do I change the regex to include each term in a dedicated OR list?
REGULAR EXPRESSION
' and | and or | ao | company | co | co | dba | dba '
DESIRED BEHAVIOR
Replace each term found (including prefix and suffix spaces) with a single space.
ACTUAL BEHAVIOR
Each found term "even" (as opposed to "odd") is replaced (including the prefix and suffixes) by one space.
Example
Source string
' MASHABLE LTD DBA THE INFORMATION EXPERTS and and or ao company co co dba dba COPYRIGHT '
Result String (desired behavior)
' MASHABLE LTD THE INFORMATION EXPERTS COPYRIGHT '
Result String (Actual Behavior)
' MASHABLE LTD THE INFORMATION EXPERTS and or company codba COPYRIGHT '
WEDNESDAY
SQL Server 2005
Custom function regexReplace based on VBScript.RegExp (code is available at the end of the message)
THE CODE
set nocount on declare @source [varchar](800) declare @regexp [varchar](400) declare @replace [char](1) declare @globalReplace [bit] declare @ignoreCase [bit] declare @result [varchar](800) set @globalReplace = 1 set @ignoreCase = 1 SET @source = ' MASHABLE LTD DBA THE INFORMATION EXPERTS and and or ao company co co dba dba COPYRIGHT ' set @regexp = ' and | and or | ao | company | co | co | dba | dba ' set @replace = ' ' select @result = master.dbo.regexReplace(@source,@regexp,@replace,@globalReplace,@ignoreCase) print @result
... result:
MASHABLE LTD THE INFORMATION EXPERTS and or company codba COPYRIGHT
* dbo.regex Replace user-defined function definition *
CREATE FUNCTION [dbo].[regexReplace] ( @source varchar(5000), @regexp varchar(1000), @replace varchar(1000), @globalReplace bit = 0, @ignoreCase bit = 0 ) RETURNS varchar(1000) AS BEGIN DECLARE @hr integer DECLARE @objRegExp integer DECLARE @result varchar(5000) EXECUTE @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT IF @hr <> 0 BEGIN EXEC @hr = sp_OADestroy @objRegExp RETURN NULL END EXECUTE @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp IF @hr <> 0 BEGIN EXEC @hr = sp_OADestroy @objRegExp RETURN NULL END EXECUTE @hr = sp_OASetProperty @objRegExp, 'Global', @globalReplace IF @hr <> 0 BEGIN EXEC @hr = sp_OADestroy @objRegExp RETURN NULL END EXECUTE @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignoreCase IF @hr <> 0 BEGIN EXEC @hr = sp_OADestroy @objRegExp RETURN NULL END EXECUTE @hr = sp_OAMethod @objRegExp, 'Replace', @result OUTPUT, @source, @replace IF @hr <> 0 BEGIN EXEC @hr = sp_OADestroy @objRegExp RETURN NULL END EXECUTE @hr = sp_OADestroy @objRegExp IF @hr <> 0 BEGIN RETURN NULL END RETURN @result END