Add / Manage libraries in Spark Pool After the Deployment

CharithCaldera · ‎May 25 2020

The Official Document provide the guidance to add / manage libraries during the deployment in Apache Spark pool.

But in this article, we are going to discuss the steps, how the versions can be upgraded after the deployment of the Apache Spark Pool.

To list the current library version, you can use below code -

import pip #needed to use the pip functions

for i in pip.get_installed_distributions(local_only=True):

print(i)

To change / upgrade the library values, click on the spark pool on Azure Synapse Analytics Workspace

Image 1.png

Navigate to "Packages" & you can find if there are any configurations files already uploaded.

Image 2.png

Upload the "requirement.txt" file (in this example, we will be upgrading only "pandas" version)

Default version (pandas 0.23.4)

Image 4.png

Image 5.png

By checking the box "Force new settings on the Apache Spark pool (will immediately stop running Apache Spark applications)", the configurations will apply to the spark pool immediately by stop running current applications.

Confirm if the file is uploaded -

Image 6.png

Once the upload is completed, run below code in spark pool to verify the versions -

import pip #needed to use the pip functions

for i in pip.get_installed_distributions(local_only=True):

print(i)

Version Upgraded to "pandas 1.0.1" successfully -

Image 7.png

The change can be also verified using the "Activity Log" by selecting the latest operations related to "Create or Update Spark pools" by checking the change history.

Click on the operation name & select "Change history (preview)" and then select the "requirement.txt"

(this will indicate the changes occurred)

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Add / Manage libraries in Spark Pool After the Deployment