First published on MSDN on Sep 01, 2017
This topic came up on a recent support case I had worked on. The customer had stored the data in an Azure SQL Data Warehouse database, but needed to move it into Azure blob storage for consolidating data from different sources. This also required to write the datetime values in a specific format.
The idea was to use PolyBase and CETAS (Create External Table As Select) for exporting the data into an external table, with the external table pointing to Azure blob storage. The CETAS was configured with an External File Format specifying a DATE_FORMAT = N'yyyy-MM-dd HH:mm:ss'.
The export finished without any errors. But when selecting the same exported data from the same external table again, it failed with a conversion error:
Expected data type: DATETIME, Offending value: 2017-03-31 17:16:15.000 (Column Conversion Error), Error: Conversion failed when converting the NVARCHAR value '2017-03-31 17:16:15.000' to data type DATETIME.
It turned out that the datetime values were written in the default format of 'yyyy-MM-dd HH:mm:ss.fff', e.g. "2016-07-27 21:29:03.000" instead of the intended "2016-07-27 21:29:03". This means that PolyBase is unable to read data that it had written itself before.
This behaviour is by design.
PolyBase indeed does not honour DATE_FORMAT for exporting to blob storage. Please note the following sentence in the middle of the
CREATE EXTERNAL FILE FORMAT (Transact-SQL)
documentation:
"PolyBase only uses the custom date format for importing the data. It does not use the custom format for writing data to an external file."
To work around this limitation, you should export the data on a different way. For example, you may use SSIS for exporting to a custom CSV format, then copy the CSV files to Azure Blob storage.
Sample Queries to reproduce the problem
-- Create a local table with sample rows
CREATE TABLE orders_local (id INT, dt1 DATETIME);
INSERT INTO orders_local VALUES (1, '2017-03-31 17:16:15');
INSERT INTO orders_local VALUES (2, '2017-04-30 17:16:15');
-- Create the external file format
CREATE EXTERNAL FILE FORMAT csv_customdate
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = N',',
STRING_DELIMITER = N'0x22',
DATE_FORMAT = N'yyyy-MM-dd HH:mm:ss',
USE_TYPE_DEFAULT = False
));
-- Create the external table - CETAS
CREATE EXTERNAL TABLE Orders
WITH (
LOCATION='/orders/orders.tbl',
DATA_SOURCE = external_table_blob_storage,
FILE_FORMAT = csv_customdate)
AS (SELECT id, c1, dt1 FROM orders_local);
-- Check the result
SELECT ID, dt1 FROM Orders;
-- the SELECT command is failing with:
Msg 107090, Level 16, State 1, Line 37
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
(/orders/orders.tbl/QID6002_20170516_133739_0.txt)Column ordinal: 2, Expected data type: DATETIME, Offending value: 2017-03-31 17:16:15.000 (Column Conversion Error), Error: Conversion failed when converting the NVARCHAR value '2017-03-31 17:16:15.000' to data type DATETIME.
References