Add / Manage libraries in Spark Pool After the Deployment

Published May 25 2020 10:56 AM 2,015 Views
Microsoft

The Official Document provide the guidance to add / manage libraries during the deployment in Apache Spark pool. 

But in this article, we are going to discuss the steps, how the versions can be upgraded after the deployment of the Apache Spark Pool. 

 

  • To list the current library version, you can use below code -
          import pip #needed to use the pip functions
          for i in pip.get_installed_distributions(local_only=True):
          print(i)

 

To change / upgrade the library values, click on the spark pool on Azure Synapse Analytics Workspace 

 

Image 1.png

 

Navigate to "Packages" & you can find if there are any configurations files already uploaded. 

 

Image 2.png

 

Upload the "requirement.txt" file (in this example, we will be upgrading only "pandas" version) 

Default version (pandas 0.23.4)

 

Image 4.png

Image 5.png

 

By checking the box "Force new settings on the Apache Spark pool (will immediately stop running Apache Spark applications)", the configurations will apply to the spark pool immediately by stop running current applications. 

 

Confirm if the file is uploaded - 

 

Image 6.png

 

Once the upload is completed, run below code in spark pool to verify the versions - 

 

          import pip #needed to use the pip functions
          for i in pip.get_installed_distributions(local_only=True):
          print(i)
 

Version Upgraded to "pandas 1.0.1" successfully - 

 
 
 

Image 7.png

 

The change can be also verified using the "Activity Log" by selecting the latest operations related to "Create or Update Spark pools" by checking the change history. 
 
Image 8.png
 
Click on the operation name & select "Change history (preview)" and then select the "requirement.txt"
(this will indicate the changes occurred) 
 
Image 9.png
Image 10.png
 
2 Comments
Version history
Last update:
‎May 25 2020 10:59 AM
Updated by: