Blog Post

Azure Synapse Analytics Blog
2 MIN READ

Add / Manage libraries in Spark Pool After the Deployment

CharithCaldera's avatar
May 25, 2020

The Official Document provide the guidance to add / manage libraries during the deployment in Apache Spark pool. 

But in this article, we are going to discuss the steps, how the versions can be upgraded after the deployment of the Apache Spark Pool. 

 

  • To list the current library version, you can use below code -
          import pip #needed to use the pip functions
          for i in pip.get_installed_distributions(local_only=True):
          print(i)

 

To change / upgrade the library values, click on the spark pool on Azure Synapse Analytics Workspace 

 

 

Navigate to "Packages" & you can find if there are any configurations files already uploaded. 

 

 

Upload the "requirement.txt" file (in this example, we will be upgrading only "pandas" version) 

Default version (pandas 0.23.4)

 

 

By checking the box "Force new settings on the Apache Spark pool (will immediately stop running Apache Spark applications)", the configurations will apply to the spark pool immediately by stop running current applications. 

 

Confirm if the file is uploaded - 

 

 

Once the upload is completed, run below code in spark pool to verify the versions - 

 

          import pip #needed to use the pip functions
          for i in pip.get_installed_distributions(local_only=True):
          print(i)
 

Version Upgraded to "pandas 1.0.1" successfully - 

 
 
 

 

The change can be also verified using the "Activity Log" by selecting the latest operations related to "Create or Update Spark pools" by checking the change history. 
 
 
Click on the operation name & select "Change history (preview)" and then select the "requirement.txt"
(this will indicate the changes occurred) 
 
 
Updated May 25, 2020
Version 2.0

2 Comments

  • jordancgy's avatar
    jordancgy
    Copper Contributor

    I've performed these exact steps:

     

    in cmd, py -m pip freeze > requirements.txt

    generates requirements.txt in format

     

    pandas==1.0.4

     

    Navigate to workspace>spark pools>packages>upload packages>select requirements.txt>select force update>upload successful

    Navigate back to notebook

    Run

     

     import pip #needed to use the pip functions
              for i in pip.get_installed_distributions(local_only=True):
              print(i)
     
    Note that pandas is still at 0.23.4
     
    I've tried many other methods to try and force pandas to update, but it's not updating. I've tried 1.01, 1.02, 1.03, and a bunch of older versions. None of them gets pandas to changge from 0.23.4 to anything else.
     
    Thoughts?