Credibility is critical to the success of a business
intelligence / data warehouse project. Problems happen, and sometimes the
problem is not the SSIS ETL process, but the incoming data. A business manager
questions the validity of a number, and the BI team has to trace back to the
source. And not just to the source file, but to the actual source record.
It's OK to tell someone you'll need to review the source
file. It's better if you can tell someone that the data came from row # from
file #, and then pull up the data quickly and easily. To do this, you need to
know the actual source row number. SSIS makes this easy.
Summary
When you are importing data from a file into your target database work table all you need to do is follow these simple steps.
- Package variable:
- add a variable to capture the row count
- In your Data Flow:
- Add a Row Count task
- Add a Script Task, with an output column
- Add the new column to your destination export
The Details
Package Variable
As part of the package, create a user variable for the
record count. For our example, we used User::RecordCntFile, Scope = Package,
DataType = Int32, Value =0. Naturally,
if your maximum row count exceeds the capacity of the Int32 container, use
Int64.
Row Count Task
Create and then name the Row Count task. We named ours
"Row Count." Using the
advanced editor for the Row Count task, make the following change.
- Component Properties:
- In the Custom Properties section, assign your user variable to the custom property - VariableName.
- Input Columns:
- no adjustment or changes for this section are required.
- Input and Output Properties:
- no adjustment or changes for this section are required.
Assign user variable in the custom properties |
Script Task
Add a script task to your data flow and give it a name.
Using the Edit Script button, add the following code. The below code happens to
be in Visual Basic, but could easily be in C#.
Script Code:
Public Class ScriptMain
Inherits UserComponent
Private _RowNumber As Integer = 1
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Row.RowNumber = _RowNumber
_RowNumber += 1
End Sub
End Class
- Input columns:
- no adjustment or changes for this section are required.
- Inputs and Outputs:
- Add a new output column, here named RowNumber, with a datatype of DT_I4 (four byte signed integer) to match our Int32 variable datatype. If your variable is something other than Int32, change this as necessary.
- Connection Manager:
- no adjustment or changes for this section are required.
New Output column: RowNumber |
Destination Task
The last step is to include the new column you added in your script task as part
of the mappings in your destination task. With this, your data warehouse tables will contain the original source file row number. And give you a big win the next time someone has to locate and validate the original source data.
No comments:
Post a Comment