Troubleshoot / Debug Package failures in Apache Spark for Azure Synapse

Published Dec 03 2021 02:39 AM 2,376 Views
Microsoft

We might encounter failures while installing packages in serverless Apache Spark pool. I am going to discuss basic steps how should you troubleshoot the package installation failures in Apache Spark pool.

 

Before we discuss about them, please take a look at the page here which provides steps how to update packages/libraries (Jar/whl/etc) in Spark pool.

 

For this article I am taking an example to install wheel file under Spark packages. I uploaded the whl file and started execution as shown in below snapshot:

 

Mukund_Bhashkar_0-1631248840365.png

 

Settings are getting applied:

 

Mukund_Bhashkar_2-1631250147657.png

 

And unexpectedly, it failed. If you go through "view details", you might not be able to decode the exact error which can lead to further resolution step.

 

In order to find driver and executor errors, you need to jump to Monitor tab. By default, in Spark pool, package is installed by the application called SystemReservedJob-LibraryManagement . So we will traverse to Monitoring tab as below and check the application that failed as part of package installation.

 

Mukund_Bhashkar_6-1631250846860.png

 

Let's proceed further with below steps in order to collect driver stdout/stderr log :

 

Click on the application that failed --> Click on Spark History server --> 

 

Mukund_Bhashkar_0-1631250974802.png

 

Next click on the application 1 or 2 and go to Executors tab :

 

Mukund_Bhashkar_1-1631251823553.png

Mukund_Bhashkar_2-1631251864729.png

 

Then switch to stdout/stderr of driver (Executor ID) to find the error. I found the error under stdout which could help me to resolve the issue:

 

Mukund_Bhashkar_1-1631252028222.png

 

Error found was :

 

Mukund_Bhashkar_2-1631252207980.png

 

This certainly says that your file needs to be built using a different upgraded Pandas version. You can work accordingly to make the installation successful after upgrading with different Pandas version. 

 

Another such error found was:

 

In most other cases you should consider using 'safe_load(stream)'
  data = yaml.load(yamlstr)
ERROR: wheel_example.whl is not a valid wheel file
 
Alternatively, you can find driver stdout in the application front page. But it is recommended to get inside the sparkUI and check error with steps provided earlier in this article .
 
Co-Authors
Version history
Last update:
‎Dec 03 2021 02:38 AM
Updated by: