Tuesday, July 26, 2016

EXASOL from SQL Server: Data types & Conversion

We are working with EXASOL and that requires loading data using SSIS (SQL Server Integration Services). For this test, we are using the current version of SQL Server Data Tools 2016 along with Visual Studio 2015. EXASOL only has a limited set of data types, so creating destination tables requires a bit of planning.  And with the tables created in EXASOL, importing data from SQL Server requires some transformations.

EXASOL Data Types

From the EXAsolution user manual, section 2.3.1, Table 2.2 Overview of EXASolution data types:

Data Type
Notes
BOOLEAN

CHAR(n)
1 ≤ n ≤ 2,000;   ascii and utf8
DATE

DECIMAL(p,s)
 s ≤ p ≤ 36
 p ≥ 1; s ≥ 0
DOUBLE PRECISION

GEOMETRY[(srid)]
 srid defines the coordinate system (see also EXA_SPATIAL_REF_SYS)
INTERVAL DAY [(p)] TO SECOND [(fp)]
 1 ≤ p ≤ 9, 0 ≤ fp ≤ 9, accuracy precise to a millisecond
INTERVAL YEAR [(p)] TO MONTH  
1 ≤ p ≤ 9
TIMESTAMP
Timestamp with accuracy precise to a millisecond
TIMESTAMP WITH LOCAL TIME ZONE
Timestamp which considers the session time zone
VARCHAR(n)
1 ≤ n ≤ 2,000,000;   ascii and utf8

It's a short list, but DECIMAL, CHAR and VARCHAR end up handling the missing data types we need. At least for now. And EXASOL is an analytic engine - not storage for blobs or pictures. The following table outlines a few of the SQL Server data types, and the corresponding EXASOL data types.  Plus, a note about whether EXASOL will handle the implicit conversion, and what T-SQL or SSIS Data Conversions we needed, or used to load the data.

All and all, the overall process was fairly logical, except for dates. As we noted, we are using SSIS, and the EXASOL ADO.NET driver with the SSIS ADO.NET Destination task.

 Data Types:  SQL Server and EXASOL:  Equivalents and Conversion Steps



SQL Server
EXASOL
Implicit (1)
T-SQL
( view or SQL Command)
SSIS
Data Conversion Task
bit
BOOLEAN
yes
Not reviewed
DT_BOOL
int
DECIMAL(18,0)
yes
cast( mycol as Decimal(18,0))
[DT_NUMERIC] 
Precision 18; Scale 0, or DT_DECIMAL, with scale of 0

smallint
DECIMAL(9,0)
yes
cast( mycol as Decimal(9,0))
[DT_NUMERIC] 
Precision 9; Scale 0, or DT_DECIMAL, with scale of 0

tinyint
DECIMAL (3,0)
no (2)
cast( mycol as Decimal(3,0))
[DT_NUMERIC]
 Precision 3; Scale 0  (1), or DT_DECIMAL, with scale of 0

float
DOUBLE PRECISION
yes
Not required
Not required





money
DECIMAL(18,4)
yes
cast( mycol as Decimal(18,4))
[DT_NUMERIC] 
Precision 18; Scale 4 , or DT_DECIMAL, with scale of 4
Decimal(x,y)
DECIMAL(x,y)
n/a
None required
None required
Nvarchar(x)
VARCHAR(x) utf8
n/a
n/a
n/a
Varchar(x)
VARCHAR(x) ascii
n/a
n/a
n/a
datetime
DATE
no (3)
convert(varchar, mycol, 101)
(6)
datetime
TIMESTAMP
no (4)
convert(varchar, mycol, 101)
[Time defaults to 00:00:00.000]
(6)
datetime
TIMESTAMP
no (4)
convert(varchar, mycolumn, 101) +  ' ' +   STUFF (convert(varchar, mycolumn, 114), 9,1,'.')

(6)
Varbinary(max)
No support
no




Notes

(1)  No Cast or Convert required, but the optional T-SQL or SSIS Data Conversions will minimize conversion risk.

(2) tinyint conversion SSIS error message:  [ADO NET Destination [2]] Error: An exception has occurred during data insertion, the message returned from the provider is: No conversion for parameters with this type System.Byte

(3) datetime implicit conversion SSIS error message:
[ADO NET Destination [2]] Error: An exception has occurred during data insertion, the message returned from the provider is: data exception - invalid character value for cast; Value: '01.06.1996 00:00:00' Format: 'MM/DD/YYYY' in write of column FACTTESTSALES.ORDERDATE (Session: 1540926262556380170)

(4) datetime implicit conversion SSIS error message:
[ADO NET Destination [2]] Error: An exception has occurred during data insertion, the message returned from the provider is: data exception - invalid character value for cast; Value: '01.06.1996 00:00:00' Format: 'MM/DD/YYYY HH24:MI:SS.FF6' in write of column FACTTESTSALES.ORDERDATETS (Session: 1540926166351754250)

(5) VARCHAR(x) ascii equates to a conventional varchar().  VARCHAR(x) utf8 equates to nvarchar(x).

(6) Consider using a Derived task.  But, doing the conversion in the database engine using T-SQL will be far more efficient. You can do this in the OLE DB Source task by using a SQL Command, rather than using a "Table or View" data access mode. If you do use the "SQL Command" option, and you later modify your SELECT such that is changes a data type [ for example: convert(varchar, mycol, 101) ], expect problems and see our note below: SQL Command Data Type Mismatch - Error.


Dates

Importing date data into EXASOL using SSIS is a bit more challenging.  And so far, we have found only a few approaches that work, and many that do not. If we look over the error messages returned back from SSIS from some failed import attempts, we see that the EXASOL ADO.NET driver is looking for this format:  



                'MM/DD/YYYY HH24:MI:SS.FF6'


To replicate the MM/DD/YYYY, in SQL Server use T-SQL:

  convert(varchar, mycolumn, 101) 

This does not return any time value, but will be accepted both by the DATE and the TIMESTAMP data types in EXASOL. To add the time value, it requires this T-SQL: 


  convert(varchar, mycolumn, 101) + ' ' +               STUFF(convert(varchar, mycolumn, 114), 9,1,'.')


STUFF?

Look closely at the time section of the required format.  "...MI:SS.FF6'"  Notice the period, and not the colon between SS and FF6.  The T-SQL CONVERT( , ,114) statement returns a colon between SS:mmmm, while the desired value requires a period. Very unfortunate. Also notice that one of the T-SQL CONVERT() ODBC formats (121) and the ISO formats (126, 127) have the desired millisecond format. Unfortunately, the date format may not work. Possible reasons to consider an ODBC connection rather than an ADO.NET connection - at least for processes with a small number of records.


SQL Command Data Type Mismatch - Error

You are using the SQL Command option in the OLE DB source task,  And the error message still tells us our date is formatted incorrectly.  Check the Metadata in the Data Flow Path Editor (see below).

When you use the SQL Command option, and you CHANGE the date type in your SELECT, the source task does not change the outgoing data type (bug).  Say, when you need to add a CONVERT statement to a datetime value - as we outlined above. Let's call it an SSIS bug. (note, we are using SQL Server Data Tools from SQL 2016 - earlier versions may not have this problem).

Fix: delete the OLE DB Source Task, create a new task, add back your updated SELECT, with the corrected data types, and connect it to your work flow. And verify that the outgoing data types are what you need.





Implicit (or direct) Date conversions that do not work, at least so far:

Naturally, you will be using one of the above CONVERT statements to convert your SQL Server date data into EXASOL. Or perhaps you have worked out one of your own. 

The following just lists our the various direct or implicit conversions that we have tried and do not seem to work. For reference, here are the date conversions that we have found that do not work, at least with SSDT/SSIS 2016 and EXASOL 5.0.15.


  
SQL Server
EXASOL
SSIS Conversion/Import
Datetime
DATE
No
Datetime
TIMESTAMP
No
Date
DATE
No
Date
TIMESTAMP
No
Datetime2
DATE
No
Datetime2
TIMESTAMP
No
SmallDatetime
DATE
No
SmallDatetime
TIMESTAMP
No
Datetimeoffset
DATE
No
Datetimeoffset
TIMESTAMP
No

And, these CONVERT statements did not work either.

ISO:         convert(varchar, OrderDate, 112)
ANSI:        convert(varchar, OrderDate, 102)
Others:      convert(varchar, OrderDate, 104)
             convert(varchar, OrderDate, 100)
             convert(varchar, OrderDate, 113)


Next Steps: EXASOL - Using SSIS to load our Data Warehouse





2 comments:

jose said...

You could use

OnConnect="alter session set NLS_TIMESTAMP_FORMAT='DD.MM.YYYY HH24:MI:SS.FF6'" in the connection string property

Andrew Peterson said...

Great thought. I'll have to give that a try!