Thursday, December 10, 2015

SSIS - Add row number to incoming source records

Credibility is critical to the success of a business intelligence / data warehouse project. Problems happen, and sometimes the problem is not the SSIS ETL process, but the incoming data. A business manager questions the validity of a number, and the BI team has to trace back to the source. And not just to the source file, but to the actual source record.

It's OK to tell someone you'll need to review the source file. It's better if you can tell someone that the data came from row # from file #, and then pull up the data quickly and easily. To do this, you need to know the actual source row number. SSIS makes this easy. 

Summary

When you are importing data from a file into your target database work table all you need to do is follow these simple steps.
  • Package variable:  
    • add a variable to capture the row count
  • In your Data Flow:
    • Add a Row Count task
    • Add a Script Task, with an output column
    • Add the new column to your destination export
This example was from a project done with SQL Server 2008, but applies to all the current versions.

The Details

Here is a more complete listing of the steps required in SSIS to add a row number to all of your incoming file records. When you are done, your data flow might look something like this:


Package Variable

As part of the package, create a user variable for the record count. For our example, we used User::RecordCntFile, Scope = Package, DataType = Int32, Value =0.  Naturally, if your maximum row count exceeds the capacity of the Int32 container, use Int64.


SSIS  Add the row number to incoming source records - Package Variable

Row Count Task

Create and then name the Row Count task. We named ours "Row Count."  Using the advanced editor for the Row Count task, make the following change.

  • Component Properties: 
    • In the Custom Properties section, assign your user variable to the custom property - VariableName.
  •  Input Columns:  
    • no adjustment or changes for this section are required.
  •  Input and Output Properties:  
    • no adjustment or changes for this section are required.


Assign user variable in the custom properties
  

Script Task


Add a script task to your data flow and give it a name. Using the Edit Script button, add the following code. The below code happens to be in Visual Basic, but could easily be in C#.

Script Code:
       

Public Class ScriptMain
   Inherits UserComponent
   Private _RowNumber As Integer = 1

   Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
       Row.RowNumber = _RowNumber
         _RowNumber += 1
       End Sub

   End Class

       
 

  • Input columns:  
    • no adjustment or changes for this section are required.
  • Inputs and Outputs:  
    • Add a new output column, here named RowNumber, with a datatype of DT_I4 (four byte signed integer) to match our Int32 variable datatype.  If your variable is something other than Int32, change this as necessary.
  • Connection Manager:  
    • no adjustment or changes for this section are required.


New Output column:  RowNumber


Destination Task

The last step is to include the new column you added in your script task as part of the mappings in your destination task. With this, your data warehouse tables will contain the original source file row number.  And give you a big win the next time someone has to locate and validate the original source data.

No comments: