Add / Manage libraries in Spark Pool After the Deployment
Published May 25 2020 10:56 AM 3,799 Views
Microsoft

The Official Document provide the guidance to add / manage libraries during the deployment in Apache Spark pool. 

But in this article, we are going to discuss the steps, how the versions can be upgraded after the deployment of the Apache Spark Pool. 

 

  • To list the current library version, you can use below code -
          import pip #needed to use the pip functions
          for i in pip.get_installed_distributions(local_only=True):
          print(i)

 

To change / upgrade the library values, click on the spark pool on Azure Synapse Analytics Workspace 

 

Image 1.png

 

Navigate to "Packages" & you can find if there are any configurations files already uploaded. 

 

Image 2.png

 

Upload the "requirement.txt" file (in this example, we will be upgrading only "pandas" version) 

Default version (pandas 0.23.4)

 

Image 4.png

Image 5.png

 

By checking the box "Force new settings on the Apache Spark pool (will immediately stop running Apache Spark applications)", the configurations will apply to the spark pool immediately by stop running current applications. 

 

Confirm if the file is uploaded - 

 

Image 6.png

 

Once the upload is completed, run below code in spark pool to verify the versions - 

 

          import pip #needed to use the pip functions
          for i in pip.get_installed_distributions(local_only=True):
          print(i)
 

Version Upgraded to "pandas 1.0.1" successfully - 

 
 
 

Image 7.png

 

The change can be also verified using the "Activity Log" by selecting the latest operations related to "Create or Update Spark pools" by checking the change history. 
 
Image 8.png
 
Click on the operation name & select "Change history (preview)" and then select the "requirement.txt"
(this will indicate the changes occurred) 
 
Image 9.png
Image 10.png
 
2 Comments
Copper Contributor

@CharithCaldera  Excellent post!

Copper Contributor

I've performed these exact steps:

 

in cmd, py -m pip freeze > requirements.txt

generates requirements.txt in format

 

pandas==1.0.4

 

Navigate to workspace>spark pools>packages>upload packages>select requirements.txt>select force update>upload successful

Navigate back to notebook

Run

 

 import pip #needed to use the pip functions
          for i in pip.get_installed_distributions(local_only=True):
          print(i)
 
Note that pandas is still at 0.23.4
 
I've tried many other methods to try and force pandas to update, but it's not updating. I've tried 1.01, 1.02, 1.03, and a bunch of older versions. None of them gets pandas to changge from 0.23.4 to anything else.
 
Thoughts?
Version history
Last update:
‎May 25 2020 10:59 AM
Updated by: