Forum Discussion
COPY INTO from parquets can't push timestamps before year 1970
Question 1 - COPY INTO with pre-1970 year failing
=================================
COPY INTO uses the Polybase feature for ingesting Parquet/ORC files, by default (not specifying AUTO_CREATE_TABLE=ON). Polybase uses an older version of parquet-mr (v1.6) where datetime2 timestamps were stored/read as INT96. This preserved full 100 nanosecond accuracy of SQL datetime2(7). Newer versions of parquet-mr, used by Spark 3.x as you are using, have deprecated the use of INT96 in favor of storing them as INT64 instead. This lost the nanosecond accuracy, but was more practical in most real-life cases. This causes conversion issues for Polybase when interpreting dates before 1970, which are signed. Polybase does support the full range of SQL datetime2's when using the older INT96 representation, including pre-1970.
For better compatibility with your newer Spark-generated parquet files, you may use COPY INTO with the AUTO_CREATE_TABLE=ON option, which uses a more modern implementation that correctly interprets the new TIMESTAMP_MILLIS parquet type represented as signed INT64.
Regarding the 2-hour offset you are observing, the parquet files were generated with a timezone set as "Africa/Cairo" which is GMT+2, as you expected (org.apache.spark.timeZone = Africa/Cairo). I inspected your supplied files using a tool called "parquet-tools".
Question 2 - DATEADD() failures
====================
The reason DATEADD() is failing for older years (i.e., 1852 and 1972) is because the the second parameter to this function is expected to be an integer data type and the signed milliseconds offset values for these old years (e.g. -3723760800000 and 63076357000, respectively) exceeds the bounds of an int and results in an Arithmetic overflow error. Here is the doc : https://docs.microsoft.com/en-us/sql/t-sql/functions/dateadd-transact-sql?view=sql-server-ver15#arguments.
A simple work-around is to convert the ms argument into minutes then call DATEADD() with this smaller value:
Instead of: SELECT dateadd(millisecond, [ts], '1970-01-01 00:00:00') AS 'CONVERTED' FROM [dbo].[TS_TESTING_2]
Do: SELECT DATEADD(minute, [ts] / 60000, '1970-01-01 00:00:00') AS 'CONVERTED' FROM [dbo].[TS_TESTING_2]