How to recursively parse data in a SQL statement

I am trying to clear the data in the "Name" field in our database and split this data into FirstName, MiddleName and LastName. I am currently using the Case statement to search for various triggers within the text to help me format the output in a specific way.

However, I am starting to notice that I am testing tests inside other tests and have to figure out how to process data recursively. See this example of how I retrieve FirstName.

 Case When CharIndex(' ',LTrim(RTrim(Name))) in (0,1) Then '' --'empty or LName' When Left(Name,3) IN ('MR ','MS ', 'DR ','MRS') Then --'Prefix Titles' Case --'If we found a prefix, run the same "tests" with the prefix removed' When CharIndex(' ',LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name))))) in (0,1) Then '' When SubString(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))),3,1) = '&' Then SubString(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ', Name)))),1,5) Else Left(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))), CHarIndex(' ',LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))))-1) End When SubString(LTrim(RTrim(Name)),3,1) = '&' Then SubString(LTrim(RTrim(Name)),1,5) --'Look for initials eg J & A Smith' Else Left(LTrim(RTrim(Name)),CHarIndex(' ',LTrim(RTrim(Name)))-1) End 

So for this to work in more complex situations (e.g. MR JOHN A SMITH JR ), I would need to test it recursively. In imperative programming, I would do something like this if I had a function called GetFirstName :

 GetFirstName('MR JOHN A SMITH JR') //GetFirstName identfies 'MR' and within the function it calls: || \\ ==> GetFirstName('JOHN A SMITH JR') //GetFirstName identifies 'JR' and within the function it calls: || \\ ==> GetFirstName('JOHN A SMITH') //Finally, it returns 'JOHN' 

Ideally, it would be great to do this in direct SQL, but I'm not sure if this is possible. What alternatives do I have if I don't use direct SQL? (I am using SQL Server 2005)

+4
source share
2 answers

I do not think that it is easy and simple to do in direct SQL . You can use regular expressions, although you need to write your own CLR function to provide regular expression functionality.

+1
source

It sounds like a one-time job. Are you sure you cannot do this in several instructions using temporary tables? If this is a one-time job, then most likely you have a higher need for accuracy and ease of code debugging than for performance.

Consider the following:

  CREATE TABLE #MyNames (
   PersonID INT PRIMARY KEY,
   OriginalName VARCHAR (50),
   WorkingName VARCHAR (50),
   CandidateTitle VARCHAR (10),
   CandidateLastName VARCHAR (50),
   CandidateFirstName VARCHAR (50),
   CandidateMiddleName VARCHAR (50)
   --Your other candidate fields .....
 )

 INSERT INTO #MyNames (PersonID, OriginalName)
  SELECT TOP 100 ID, LTRIM (RTRIM (Name)) from SourcePersonTable

 --Possibly add some indexes here for original name

 UPDATE #MyNames
  SET CandidateTitle = LEFT (OriginalName, 3),
      WorkingName = SUBSTRING (OriginalName, 4,9999)
 Where LEFT (OriginalName, 3) IN
     ('MR', 'MRS', 'MS', 'DR')

 - etc ...

Just add steps and edit the [WorkingName] field. When you're done with one skip, just do ...

  UPDATE #MyNames SET WorkingName = OriginalName

... and you are ready to do another sweep.

0
source

Source: https://habr.com/ru/post/1334699/


All Articles