ADF throws error: Unexpected error encountered filling record reader buffer ClassCastException

Microsoft

May 19, 2020

Today we want to write about a very interesting case that our team Prathibha Vemulapalli, Charl Roux, and I worked this very week.

A parquet file was created with more than 100 columns to be imported on the AzureDW using Azure Data Factory. The file had diverse datatypes. So, before I go more deeply on the error and the troubleshooting process. Let me share step by step of the environment.

Put in a simple way. This environment has a parquet file as source, SQL DW as destination running on ADF.

As figure 1,2 and 3 shows:

Figure 1: Define Blob Storage

Figure 2: Define Parquet file

Figure 3: Define Source file

The destination is SQL DW as figure 4 shows:

Figure 4: Define data destination

Once the table was defined on the SQLDW connector. A pipeline must be created with the source (parquet), Sink (SQLDW) as figure 5 shows:

Figure 5: pipeline

Publish all changes, rename the pipeline to a suitable name.

More details here about data factory you can find here:https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal

Our environment is ready to debug. Once I run the pipeline ( debug option) it will hit the condition I mentioned before:

Figure 6: ADF Cast error

The table was defined with data types like Int, DateTime, nVarchar , Float, and Real. More than 100 rows and the error was::

“Error happened when loading data into SQL Data Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: class java.lang.Integer cannot be cast to class parquet.io.api.Binary (java.lang.Integer is in module java.base of loader 'bootstrap'; parquet.io.api.Binary is in unnamed module of loader 'app'),Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer"

Discussing in your team this error was spotted: ClassCastException: class java.lang.Integer cannot be cast to class parquet.io.api.Binary

But which one from the 100 columns?

Once your main table accepts null values you can test by excluding the columns from a certain datatype. For example, try to exclude all the int columns from your transformation see if it works and so on so forth until you find the on to blame, as figure 7 shows:

Figure 7: Delete column to be imported

Note in ADF, once you click on the transformation you can check the schema and also preview the source data. If you changed a datatype you can clear to refresh and import the schema again as figure8 shows:

Figure 8: ADF Metadata

In our case, the suitable solution found by the customer was recreated the parquet file using INT96 instead of INT32 which raised the error mentioned on date columns.

Update: Another customer changed the way the parquet file was created so it would use timestamp instead of date like this:

.withColumn("columns_date", col("columns_date").cast("timestamp"))

Once you spot this error, you will need to review the data types on the table and work on it.

That was great CSS teamwork

Liliam C Leme

Updated Aug 04, 2020

Version 5.0

Synapse Pipelines

Synapse SQL

Liliam_C_Leme

Microsoft

Joined May 04, 2020