Handling Oracle CLOBs in Biml

As I've written previously, you can use the Attunity Oracle connector in SSIS packages generated with Biml. If you're creating these packages by hand, the experience is seamless. With Biml, you can employ the OracleSource element, which behaves like the OleDbSource element, in that it handles a lot of the column specifications for you.

However, if you're using Biml without the benefit of the OracleSource element (e.g. if you can't use Mist, or if you need to employ the CustomComponent syntax ), then you might run into data type conversion issues, as you need to specify all source columns and their data types explicitly, and implicit conversions in SSIS source components generally result in errors.

For most data types, it's a relatively straightforward process:
1. Identify the Oracle data type
2. Map the Oracle data type to a SQL Server data type
3. Figure out what SSIS data type maps to that SQL Server data type (and therefore, which Biml data type you need to specify)

Do take special note when using Oracle NUMBER columns; those convert to different SQL Server types depending on their precision and scale. Once you nail down the mapping (I like to use a function in an included file), Biml handles those smoothly.

The CLOB datatype, however, gave me problems on a recent project. In the rest of this post, I'll walk through a reproduction of the issue, the details of the issue itself, and how I worked around it.

What's a CLOB?

CLOBs (Character Large Objects) are a specific member of the general set of large object data types available in Oracle. They're basically used to store large strings or documents.

You might run into them if you're extracting data from an Oracle application (or any application that uses an Oracle data layer) using SSIS rather than an SDK or API.

Setup

Oracle's sample HR schema doesn't have a CLOB column, so I created my own table (HR.EMPLOYEE_COMMENTS) consisting of all the EMPLOYEE_IDs from HR.EMPLOYEES, plus a CLOB column named COMMENTS.

I populated a random sample of the COMMENTS column with some lorem ipsum.

Next, I created a destination table on my machine's instance of SQL Server. Since CLOB maps to VARCHAR(MAX) , I created the destination COMMENTS column as a VARCHAR(MAX).

Finally, I created a simple SSIS package (using the Attunity Oracle source component) by hand to serve as a model for my Biml script.

The Problem

Next, I wrote a Biml script mimicking my hand-created SSIS package. Most importantly, I've specified the AnsiString Biml datatype for the COMMENTS column, since CLOB maps to VARCHAR(MAX), which maps to DT_TEXT, which maps to AnsiString, according to this extremely handy table on Cathrine Wilhelmsen's site.
Note the explicit datatype declarations, and the multiple column specifications.

You'd expect this just to work - after all, I was able to create it by hand. However, executing the package results in an error.
That's weird - why is it telling me that the source component is trying to convert data types? I used the same data type and the same length for all of the COMMENTS columns in my Biml script. And if I go into the advanced editor to view the external and output columns, they also look the same (they're both DT_TEXT).
What gives? Well, for some reason, the column datatypes in the underlying XML of the generated package aren't actually the same.
Note the two highlighted (mismatching) dataType properties.

One column in the source component is a "str", and the other is a "text". Compare this to the XML from the package I created by hand. In that package, the external and output columns have the same underlying data type.
I'm not sure exactly why this happens. My theory is that because AnsiString compiles to more than one SSIS data type, the resulting data type property can vary depending on what element is being emitted by the Biml compiler (external metadata column vs. output column); and the only reason the external and output columns appear to be the same in SSDT is because of the way the package editor interprets the underlying XML.

In any case, this is still an issue in version 5.0.60425 of BimlExpress. Since I didn't want to give up on using Biml for this project, I came up with a workaround. I'll describe that workaround next.

The Solution

Since the problem was (in theory) that a single Biml data type mapped to too many SSIS/SQL Server data types, my workaround was to find a Biml data type that:

  • Maps to only one SSIS data type
  • Maps to an SSIS data type that actually makes sense for my CLOB column (i.e. I can't use Int32)

Looking again at Cathrine's handy table (which is even sortable!), I saw that I could use the Xml data type.

Wait, what? Xml?

I chose Xml because it only maps to a single SSIS data type: DTNTEXT. And although the table on Cathrine's site maps that DTNTEXT to the SQL Server XML data type, I knew from experience that if you specified "DTNTEXT" and the destination column was actually any of the SQL Server data types that map to DTNTEXT, the package would function as expected. These SQL Server data types are:

  • NTEXT
  • XML
  • NVARCHAR(MAX)

NVARCHAR(MAX) is what I used in the end, since it's the closest I can get to VARCHAR(MAX). First, I altered my COMMENTS column to an NVARCHAR(MAX).
Then, I made a couple changes to the Oracle source component in my Biml script:

  1. I added the TO_NCLOB() function to the source query, to convert the CLOB to NCLOB, which maps to NVARCHAR(MAX).
  2. I specified the COMMENTS column elements using Biml's Xml data type

Finally, I regenerated the package and tested it.

Summary

If you're extracting data from Oracle using Biml, and at least one of your tables has a CLOB column, the Biml compiler can emit mismatching data types between the external and output columns in the Oracle source component. This will result in a "datatype conversion is not supported" error.

In order to get around this, do the following:

Convert the CLOB to an NCLOB in your source query, using the TONCLOB() function
Change the destination column to an NVARCHAR(MAX)
Specify the NCLOB column using Biml's "Xml" data type, which always compiles to DT
NTEXT
This is absolutely a workaround (and one that can potentially have performance impacts), and I'm hoping this gets resolved in future versions.