First published on MSDN on Nov 27, 2015
Most of the SSIS data flow components allow customers to log the rows with errors to a dedicated error output, which can help customers to investigate and diagnose the root cause of the error. Error output has two 4-byte int columns (
ErrorCode
and
ErrorColumn
), which indicate the
error code
and the id of the error column.
For almost ten years, customers have been asking for improvement of the
ErrorColumn
in the error output. There has not been any easy way to get the column name according to the int column ID stored in
ErrorColumn
, which makes
ErrorColumn
useless in most cases. In this blog, I'll focus on the improvements that we have made to solve the
ErrorColumn
issue. We now allow customers to easily get the error column name in SSIS designer and execution log. We also provide API so that customers can programmatically get the column name according to an int column ID.
Improvement on SSIS Designer
Advanced Editor
In SSIS data flow, there is a concept named "lineage ID", which is associated with an input or output column. In general, the lineage ID of a column represents the ID of the original upstream output column, here I just call this upstream output column “lineage column”. The lineage ID allows customers to tell where the data of current column comes from. Actually, most of the components will put the lineage id of the error column to its error output. However, it’s very hard for customers to identify or locate the exact lineage column name based on an int lineage ID.
If you open the
Advanced Editor
of a data flow component, you can see column properties such as
ID
,
IdentificationString
,
LineageID
in the
Input and Output Properties
tab, where the property
ID
is associated with the property
IdentificationString
, a human readable string that uniquely identifies an input or output column. An example of identification string looks like: “
OLE DB Source.Outputs[OLE DB Source Output].Columns[UserID]
”. However, currently there is not a similar property that is associated with the property
LineageID
, which makes it hardly to know the upstream lineage column of a column.
To address the problem, a new property
LineageIdentificationString
has been introduced. According to this property, developers can easily identify the lineage column of any input or output columns, which will be useful in pipeline designing.
Data Viewer
Data Viewer
allows customers to see the data rows that come from the upstream output and is often used for debugging errors in SSIS. And if it is enabled on an error output, in addition to all regular columns including
ErrorCode
and
ErrorColumn
, there will be another virtual column
ErrorCode - Description
, which contains the error message translated from the value of the column
ErrorCode
. However, previously there is not a similar virtual column that can translate the value of
ErrorColumn
to the identification string of the error column.
With the error column improvements, we have added a new virtual column
ErrorColumn - Description
in
Data Viewer
. This column contains the identification string of the error column. Thanks to the new introduced virtual column, now customers can know not only “why” the error happens, but also “where” the error happens!
Improvement on Execution Log
Although the improvement on
Data Viewer
can help customers to know the information of error columns in designer, we still need another story on diagnosing pipeline errors after the package has been deployed to SQL Server, because there is no designer to help at that time. Obviously the most common way for diagnosing on SQL Server is leveraging execution log.
Our error column improvement now allows SSIS runtime to write the int column ID to identification string map of all pipelines in XML format to the package level event
DiagnosticEx
, so that customer can find out the exact error column information by looking up the XML document. Here is a sample of the XML document:
The XML document above can be logged via log providers. To achieve that, you need to enable
DiagnosticEx
event in log configuration window of SSIS designer. However, currently there is a known limitation on the built-in
SQL Server log provider
: the content of the XML document will be truncated to 2048 bytes, because the type of the column
message
for
SQL Server log provider
is nvarchar(2048) instead of nvarchar(max). So you’d better NOT choose
SQL Server log provider
if you want to see the entire XML content.
In addition to log providers, if the package has been deployed to SSIS catalog, the XML document above can also be written to SSISDB. You need to set appropriate logging level to make SSIS catalog log the XML document. There are two choices:
-
Simply use the built-in logging level
Verbose
;
-
Create a new
customized logging level
, and at least enable
DiagnosticEx
event.
Please note that
customized logging level
is also a new feature introduced in SQL 2016 CTP 2.3. Later there should be another blog talking about more details about this feature.
Improvement on Public API
We also provide related public APIs so that customers can build their own related solutions on top of the APIs. First, we created a new interface
IDTSComponentMetaData130
that inherits from
IDTSComponentMetaData100
. There is a new method inside this new interface to get the identification string based on the int column ID:
string GetIdentificationStringByID(int lID);
It can be accessed both in the component base class
PipelineComponent
and
Script Component
, so that both 3
rd
components developer and SSIS package developer can leverage this method to get the identification string according to a column ID.
In addition, we created new interfaces
IDTSInputColumn130
and
IDTSOutputColumn130
that inherit from
IDTSInputColumn100
and
IDTSOutputColumn100
respectively, and a new read only string property
LineageIdentificationString
is added to both new interfaces. This property is also exposed in
Advanced Editor
.
Below is an example to showcase how to leverage the new method
GetIdentificationStringByID
in
Script Component
to get the identification string of the error column.
-
Before creating the new
Script Component
, configure an upstream component in the data flow to redirect rows to its error output when an error or truncation occurs. For testing purposes, you may want to configure a component in a manner that ensures that errors will occur. For example, by configuring a
Lookup
transformation between two tables where the lookup will fail.
-
Add a new
Script Component
to the Data Flow designer surface and configure it as a transformation.
-
Connect the error output from the upstream component to the new
Script Component
.
-
In the
Script Transformation Editor
, on the
Input Columns
page, select the
ErrorColumn
column.
-
On the
Inputs and Outputs
page, add a new output column of type String named “ErrorColumnDescription”. Increase the default length of the new column to 255 to support long identification string.
-
Open the
Script Transformation Editor
, and on the Script page, for the
ScriptLanguage
property, select the script language.
-
Click
Edit Script
to open the Microsoft Visual Studio Tools for Applications (VSTA) IDE and add the sample code shown below.
-
Close VSTA.
-
Close the
Script Transformation Editor
.
-
Attach the output of the
Script Component
to a suitable destination. A Flat File destination is the easiest to configure for ad hoc testing.
-
Run the package.
public class ScriptMain : UserComponent
{
……
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
var component130 = this.ComponentMetaData as IDTSComponentMetaData130;
if (component130 != null)
{
Row.ErrorColumnDescription = this.ComponentMetaData.GetIdentificationStringByID(Row.ErrorColumn);
}
}
}
Conclusion
ErrorColumn
is an important information in pipeline error diagnostic. However, in previous versions of SSIS, customers don’t have any easy way to make use of such information. This blog has introduced several improvements we have made on
ErrorColumn
, which have covered design time and runtime diagnostic with
ErrorColumn
, and we have also provided public APIs to help customers easily implement their own scenarios related to
ErrorColumn
.