SSIS - Script Component, Split one line into several lines (Parent variation for children)

Thanks in advance for your help. I need help writing a SSIS script component to split one line into several lines. There were many useful blogs and posts that I looked below:

http://beyondrelational.com/ask/public/questions/1324/ssis-script-component-split-single-row-to-multiple-rows-parent-child-variation.aspx

http://bi-polar23.blogspot.com/2008/06/splitting-delimited-column-in-ssis.html

However, to complete the project I need additional coding help. Mostly here I want to do.

Input data

     ID Item Name
     1 Apple01.02, Banana01.02.03
     2 Spoon1,2, Fork1,2,3,4

Output

     ParentID ChildID Item Name
     1 1 Apple01
     1 2 Apple02
     1 3 Banana01
     1 4 Banana02
     1 5 Banana03
     2 1 Spoon1
     2 2 Spoon2
     2 3 Fork1
     2 4 Fork2
     2 5 Fork3
     2 6 Fork4

Below is my attempt to code, but feel free to review the whole if it is illogical. SSIS Asynchronous output installed.

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) Dim posID As Integer, childID As Integer Dim delimiter As String = "," Dim txtHolder As String, suffixHolder As String Dim itemName As String = Row.ItemName Dim keyField As Integer = Row.ID If Not (String.IsNullOrEmpty(itemList)) Then Dim inputListArray() As String = _ itemList.Split(New String() {delimiter}, _ StringSplitOptions.RemoveEmptyEntries) For Each item As String In inputListArray Output0Buffer.AddRow() Output0Buffer.ParentID = keyField If item.Length >= 3 Then txtHolder = Trim(item) Output0Buffer.ItemName = txtHolder 'when item length is less than 3, it suffix Else suffixHolder = Trim(item) txtHolder = Left(txtHolder.ToString(), Len(txtHolder) _ - Len(suffixHolder)) & suffixHolder.ToString() Output0Buffer.ItemName = txtHolder End If Next End If End Sub 

The following output is generated in the current code.

  ID Item Name
 1 Apple01
 1 02
 1 Banana01
 1 02
 1 03
 2 Spoon1
 2 2
 2 Fork1
 2 2
 2 3
 2 4

+4
source share
1 answer

If I came across a pedantic one in this answer, this is not my intention. Based on the comment “I am new to coding and troubleshooting”, I wanted to go through my observations and how I came to them.

Problem analysis

The desire is to split one line into several lines of output based on the delimited field associated with the line.

Currently, the code generates the appropriate number of lines, so you have the asynchronous part (split) of the script that works so that is a plus. What needs to happen, we need: 1) fill in the column of the identifier of the child 2) apply the prefix of the element to the entire subsequent line when creating child elements.

I consider most of these problems. What am I trying to do? What works? What does not work? What needs to be done to make it work. Separating problems into smaller and smaller problems will ultimately lead to something that you can do.

Code Observations

An insert in the supplied code led to an error that was not declared by itemList. Based on usage, it seems like it is for itemName.

After correcting this question, you should notice that the IDE indicates that you have two unused variables (posID, childID) and that variable txHolder is used before it been assigned a value. A null reference exception could result at runtime. variable txHolder is used before it been assigned a value. A null reference exception could result at runtime. My colleague often notices warnings - these are errors that have not yet grown, so my advice to you as a developer is to pay attention to warnings unless you explicitly expect the compiler to warn you about this scenario.

Beginning of work

With the choice between resolving a situation with a child identifier and a prefix / name suffix, I would start with a simple, child id

Create a surrogate key

What a fancy title phrase that if you searched, you will have many hits for ssistalk or sqlis or any of several fabulously smart bloggers. The devil, of course, knows what to look for. No, where do you ever calculate or assign the value of the identifier of a child to a stream, which, of course, is why it does not appear there.

We just need to create a monotonically increasing number, which is reset every time the original identifier changes. I make the assumption that the incoming identifier is unique in the incoming data, since the sales invoice number will be unique, and we will split the purchased items. However, if these identifiers were repeated in the data set, perhaps instead of representing the account numbers, they are the identifier of the seller. Sales Officer 1 may have another row in the sale of vegetables. This is a more complex scenario, and we can reconsider if it better describes your source data.

There are two parts to creating our surrogate key (again, break the problems into smaller parts). The first thing to do is to do something that counts from 1 to N. You specified the variable childId to serve this. Initialize this variable (1) and then increment it inside your foreach loop.

Now that we are counting, we need to output this value to the output stream. Combining these two steps will look like

  childID = 1 For Each item As String In inputListArray Output0Buffer.AddRow() Output0Buffer.ParentId = keyField Output0Buffer.ChildId = childID ' There might be VB shorthand for ++ childID = childID + 1 

Run the package and good luck! Scratch the generated surrogate key from the list. surrogate key generated

Line crushing

I don’t know what moment you want to make in the other half of the problem, but I needed some kind of heading for this section. Given the raw data, this can be harder to achieve. You put the value of Apple01, Banana01, Spoon1, Fork1. It looks like there is a template (a name associated with the code), but what is it? Your code indicates that if it is less than 3, this is a suffix, but how do you know what a base is? The first line uses the leading digit 0 and has two digits, and the second line does not use the leading zero. Here you need to understand your data. What is the rule for identifying the "code" part of the first line? Some possible algorithms

  • Connect your data providers up to provide consistent length codes (I think this worked once in my 13 years, but it never hurts to backtrack from the source).
  • Assuming the code is always a number, evaluate each character in reverse order, checking to see if it can be cast to an integer (it processes variable-length codes)
  • Suppose the second element in the split array provides the code length. This is the approach you are using with your code, and it really works.

I did not make any changes to make the generated item name work outside the local variables ItemName / itemList. The final code resolves the warning by removing the PosID and initializing txtHolder with an empty string.

 Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) Dim childID As Integer Dim delimiter As String = "," Dim txtHolder As String = String.Empty, suffixHolder As String Dim itemName As String = Row.ItemName Dim keyField As Integer = Row.ID If Not (String.IsNullOrEmpty(itemName)) Then Dim inputListArray() As String = _ itemName.Split(New String() {delimiter}, _ StringSplitOptions.RemoveEmptyEntries) ' The inputListArray (our split out field) ' needs to generate values from 1 to N childID = 1 For Each item As String In inputListArray Output0Buffer.AddRow() Output0Buffer.ParentId = keyField Output0Buffer.ChildId = childID ' There might be VB shorthand for ++ childID = childID + 1 If item.Length >= 3 Then txtHolder = Trim(item) Output0Buffer.ItemName = txtHolder Else 'when item length is less than 3, it suffix suffixHolder = Trim(item) txtHolder = Left(txtHolder.ToString(), Len(txtHolder) _ - Len(suffixHolder)) & suffixHolder.ToString() Output0Buffer.ItemName = txtHolder End If Next End If End Sub 
+7
source

Source: https://habr.com/ru/post/1399866/


All Articles