Using both custom python wheels and requirements.txt file for packages in spark pools

Copper Contributor

Till last week were able to use both requirements.txt for publicly available spark packages and for the custom ones we add it as '.whl' files to our sparkpools.

 

athuljcde_0-1710162400158.png

However from the last week(wednesday), we have been facing issues where the custom python wheel packages uploaded are not recognized by the sparkpool at runtime. 

 

We could solve it by adding all the generally available packages to under workspace package and remove requirements.txt but this would be cumbersome going forward, do you if there is an alternative to this or know why this has stopped working all of a sudden. 

1 Reply

@athuljcde you can address this issue by uploading your custom .whl files to the linked storage account associated with your Spark pool. This will ensure that your custom packages are accessible during runtime.

 

You can also create a script that identifies the dependencies based on your requirements.txt.

This script will then print the names of the new wheel files or dependencies for your input library requirements. You can use this information to ensure your custom packages are correctly installed in the Spark pool.

 

Note: Verify that the storage account permissions are correctly set for accessing custom .whl files.

I recommend investigating why this behavior changed suddenly. Some potential factors include any updates or changes to the environment like Spark version and configuration that might have affected package recognition.

 

Kindly check the logs and diagnostics to identify any specific errors or warnings related to package installation. 

 

You can read more about Package management on Microsoft Learn

--

If this post is helpful, please give my response a thumbs up! You can also mark it as the solution to help others find it easily.

 

Thanks