Using SQL Spark to troubleshooting SQL On Demand conversion overflow

Published Aug 28 2020 10:15 AM 1,200 Views
Microsoft

This case started as an issue on SQL on-demand while reading a parquet file. The error was conversion overflows.

 Error: Conversion overflows.

So my objective on this post is to describe a possible workaround to read this file in the Spark as also to give some ideas on how to troubleshoot this with the notebook.

Basically I loaded the file on Spark using Scala and it completed successfully:

 

 

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// Create a DataFrame 
val df = sqlContext.read.parquet("abfss://<storage>.dfs.core.windows.net/folder/file.parquet")
Display (df.limit(10))

 

 

No big deal here. After that, I wanted to check the metadata:

 

 

// Print the schema in a tree format
df.printSchema()

 

 

You should have a result like the following:

metadata.png

 

More information you can find here: 

https://spark.apache.org/docs/preview/sql-programming-guide.html

 

Once I found the metadata I used the column name information to run the query against SQL on-demand. With this information, I found the issue was related to a decimal column.

Here some SQL On-demand query tips for conversion issues:

1) You can use schema inferring while querying and by using it you can convert the column datatype. For example, I will convert the decimal column named as column_dec_converted to varchar.

 

 

SELECT *	  
    FROM OPENROWSET(
        BULK 'https://<storage>.dfs.core.windows.net/folder/file.parquet',
        FORMAT='PARQUET'  )     WITH ( column_dec_converted varchar(100) ) as rows

 

2) You can actually apply the convert on the Select Clause as you would do on standard SQL Server:

    SELECT cast (column_dec_converted as varchar(100))                     
    FROM OPENROWSET(
        BULK https://<storage>.dfs.core.windows.net/folder/file.parquet',
        FORMAT='PARQUET'  )    as Result
 
SQL On Demand does not support DML but it fully Supports Select syntax.
 
Another option would be:  You can also convert the file that you read in parquet into a SQL DW table. The steps are here: https://techcommunity.microsoft.com/t5/azure-synapse-analytics/arithmetic-overflow-error-converting-...
 

That is it!

Liliam C Leme

UK Engineer

1 Comment

Thank you for this article Liliam! 

%3CLINGO-SUB%20id%3D%22lingo-sub-1619655%22%20slang%3D%22en-US%22%3EUsing%20SQL%20Spark%20to%20troubleshooting%20SQL%20On%20Demand%20conversion%20overflow%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1619655%22%20slang%3D%22en-US%22%3E%3CP%3EThis%20case%20started%20as%20an%20issue%20on%20SQL%20on-demand%20while%20reading%20a%20parquet%20file.%20The%20error%20was%20conversion%20overflows.%3C%2FP%3E%0A%3CP%3E%3CFONT%20color%3D%22%23FF0000%22%3E%26nbsp%3B%3CSPAN%3EError%3A%20Conversion%20overflows.%3C%2FSPAN%3E%3C%2FFONT%3E%3C%2FP%3E%0A%3CP%3ESo%20my%20objective%20on%20this%20post%20is%20to%20describe%20a%20possible%20workaround%20to%20read%20this%20file%20in%20the%20Spark%20as%20also%20to%20give%20some%20ideas%20on%20how%20to%20troubleshoot%20this%20with%20the%20notebook.%3C%2FP%3E%0A%3CP%3EBasically%20I%20loaded%20the%20file%20on%20Spark%20using%20Scala%20and%20it%20completed%20successfully%3A%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-applescript%22%3E%3CCODE%3Eval%26nbsp%3BsqlContext%26nbsp%3B%3D%26nbsp%3Bnew%26nbsp%3Borg.apache.spark.sql.SQLContext(sc)%0A%2F%2F%26nbsp%3BCreate%26nbsp%3Ba%20DataFrame%20%0Aval%26nbsp%3Bdf%26nbsp%3B%3D%26nbsp%3BsqlContext.read.parquet(%22abfss%3A%2F%2F%3CSTORAGE%3E.dfs.core.windows.net%2Ffolder%2Ffile.parquet%22)%0ADisplay%26nbsp%3B(df.limit(10))%0A%0A%3C%2FSTORAGE%3E%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ENo%20big%20deal%20here.%20After%20that%2C%20I%20wanted%20to%20check%20the%20metadata%3A%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-applescript%22%3E%3CCODE%3E%2F%2F%20Print%20the%20schema%20in%20a%20tree%20format%0Adf.printSchema()%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EYou%20should%20have%20a%20result%20like%20the%20following%3A%3C%2FP%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22metadata.png%22%20style%3D%22width%3A%20412px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F215266i7226030B3CD48C25%2Fimage-size%2Flarge%3Fv%3D1.0%26amp%3Bpx%3D999%22%20title%3D%22metadata.png%22%20alt%3D%22metadata.png%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EMore%20information%20you%20can%20find%20here%3A%26nbsp%3B%3C%2FP%3E%0A%3CP%20style%3D%22margin%3A%200in%3B%20font-family%3A%20Calibri%3B%20font-size%3A%2011.0pt%3B%22%3E%3CA%20href%3D%22https%3A%2F%2Fspark.apache.org%2Fdocs%2Fpreview%2Fsql-programming-guide.html%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noreferrer%22%3Ehttps%3A%2F%2Fspark.apache.org%2Fdocs%2Fpreview%2Fsql-programming-guide.html%3C%2FA%3E%3C%2FP%3E%0A%3CP%20style%3D%22margin%3A%200in%3B%20font-family%3A%20Calibri%3B%20font-size%3A%2011.0pt%3B%22%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%20style%3D%22margin%3A%200in%3B%20font-family%3A%20Calibri%3B%20font-size%3A%2011.0pt%3B%22%3EOnce%20I%20found%20the%20metadata%20I%20used%20the%20column%20name%20information%20to%20run%20the%20query%20against%20SQL%20on-demand.%20With%20this%20information%2C%20I%20found%20the%20issue%20was%20related%20to%20a%20decimal%20column.%3C%2FP%3E%0A%3CP%20style%3D%22margin%3A%200in%3B%20font-family%3A%20Calibri%3B%20font-size%3A%2011.0pt%3B%22%3EHere%20some%20SQL%20On-demand%20query%20tips%20for%20conversion%20issues%3A%3C%2FP%3E%0A%3CP%20style%3D%22margin%3A%200in%3B%20font-family%3A%20Calibri%3B%20font-size%3A%2011.0pt%3B%22%3E1)%20You%20can%20use%20schema%20inferring%20while%20querying%20and%20by%20using%20it%20you%20can%20convert%20the%20column%20datatype.%20For%20example%2C%20I%20will%20convert%20the%20decimal%20column%20named%20as%26nbsp%3Bcolumn_dec_converted%20to%20varchar.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-applescript%22%3E%3CCODE%3ESELECT%20*%20%20%20%0A%20%20%20%20FROM%20OPENROWSET(%0A%20%20%20%20%20%20%20%20BULK%20'https%3A%2F%2F%3CSTORAGE%3E.dfs.core.windows.net%2Ffolder%2Ffile.parquet'%2C%0A%20%20%20%20%20%20%20%20FORMAT%3D'PARQUET'%20%20)%20%20%20%20%20WITH%20(%20column_dec_converted%20varchar(100)%20)%20as%20rows%0A%0A%3C%2FSTORAGE%3E%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E2)%20You%20can%20actually%20apply%20the%20convert%20on%20the%20Select%20Clause%20as%20you%20would%20do%20on%20standard%20SQL%20Server%3A%3C%2FP%3E%0A%3CDIV%3E%0A%3CDIV%3E%3CPRE%20class%3D%22lia-code-sample%20language-applescript%22%3E%3CCODE%3E%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3BSELECT%26nbsp%3Bcast%26nbsp%3B(column_dec_converted%20as%26nbsp%3Bvarchar(100))%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%0A%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3BFROM%26nbsp%3BOPENROWSET(%0A%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3BBULK%20https%3A%2F%2F%3CSTORAGE%3E.dfs.core.windows.net%2Ffolder%2Ffile.parquet'%2C%0A%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3BFORMAT%3D'PARQUET'%26nbsp%3B%26nbsp%3B)%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3Bas%26nbsp%3BResult%3C%2FSTORAGE%3E%3C%2FCODE%3E%3C%2FPRE%3E%3C%2FDIV%3E%0A%3CDIV%3EMore%20about%20SQL%20OD%20parquet%20query%3A%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fsynapse-analytics%2Fsql%2Fquery-parquet-files%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fsynapse-analytics%2Fsql%2Fquery-parquet-files%3C%2FA%3E%3C%2FDIV%3E%0A%3CDIV%3E%26nbsp%3B%3C%2FDIV%3E%0A%3CDIV%3ESQL%20On%20Demand%20does%20not%20support%20DML%20but%20it%20fully%20Supports%20Select%20syntax.%3C%2FDIV%3E%0A%3CDIV%3E%26nbsp%3B%3C%2FDIV%3E%0A%3CDIV%3EAnother%20option%20would%20be%3A%26nbsp%3B%20You%20can%20also%20convert%20the%20file%20that%20you%20read%20in%20parquet%20into%20a%20SQL%20DW%20table.%20The%20steps%20are%20here%3A%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fazure-synapse-analytics%2Farithmetic-overflow-error-converting-double-to-data-type-float%2Fba-p%2F1593643%22%20target%3D%22_blank%22%3Ehttps%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fazure-synapse-analytics%2Farithmetic-overflow-error-converting-double-to-data-type-float%2Fba-p%2F1593643%3C%2FA%3E%3C%2FDIV%3E%0A%3CDIV%3E%26nbsp%3B%3C%2FDIV%3E%0A%3C%2FDIV%3E%0A%3CP%3EThat%20is%20it!%3C%2FP%3E%0A%3CP%3ELiliam%20C%20Leme%3C%2FP%3E%0A%3CP%3EUK%20Engineer%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-TEASER%20id%3D%22lingo-teaser-1619655%22%20slang%3D%22en-US%22%3E%3CP%3EThis%20case%20started%20as%20an%20issue%20on%20SQL%20on-demand%20while%20reading%20a%20parquet%20file.%20The%20error%20was%20conversion%20overflows.%20But%20which%20column%20was%20failing%20between%20229%3F%3C%2FP%3E%0A%3CP%3ESo%20to%20get%20more%20details%20of%20the%20error%20I%20used%20Spark.%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-TEASER%3E%3CLINGO-LABS%20id%3D%22lingo-labs-1619655%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3ESynapse%20Spark%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3ESynapse%20SQL%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3ESynapse%20Support%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1620301%22%20slang%3D%22en-US%22%3ERe%3A%20Using%20SQL%20Spark%20to%20troubleshooting%20SQL%20On%20Demand%20conversion%20overflow%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1620301%22%20slang%3D%22en-US%22%3E%3CP%3EThank%20you%20for%20this%20article%20Liliam!%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E
Version history
Last update:
‎Aug 28 2020 10:23 AM
Updated by: