Orchestrate and operationalize Synapse Notebooks and Spark Job Definitions from Azure Data Factory

Microsoft

Jan 25, 2023

Today, we are introducing support for orchestrating Synapse notebooks and Synapse spark job definitions (SJD) natively from Azure Data Factory pipelines. It immensely helps customers who have invested in ADF and Synapse Spark without requiring to switch to Synapse Pipelines for orchestrating Synapse Notebooks and SJD.

NOTE: Synapse notebook and SJD activities were only available in Synapse Pipelines previously.

One of the critical benefits of Synapse notebooks is the ability to use Spark SQL and PySpark to perform data transformations. It allows you to use the best tool for the job, whether it be SQL for simple data cleaning tasks or PySpark for more complex data processing tasks.

How to get started with Synapse Notebooks in ADF?

1. Add Synapse Notebook activity into a Data Factory pipelines

2. Create a connection to Synapse workspace through a new compute Linked Service (Azure Synapse Analytics Artifact)

3. Choose an existing notebook to operationalize

Note: If you do not specify 'Spark pool', 'Executor size', etc., it will use the one specified in the notebook. These properties are optional and only provides you additional spark configurations to override these during the operational run.

4. Grant the ADF Managed Identity the "Synapse Compute Operator" permissions to execute a Notebook / SJD in the Synapse Workspace

Step 2 (Creation of Azure Synapse Analytics artifact linked service) highlights the Managed Identity Name of the Data Factory that needs to be granted permission to run a notebook / SJD.

5. Monitor the notebook run details by accessing the activity output, which contains "sparkApplicationStudioUrl" that takes you to Synapse Workspace for detailed run monitoring. Notebook "exitValue" is also accessible in the output and can be referenced in the down stream activities.

Resources

Documentation: Synapse Notebook activity in ADF
Documentation: Synapse SJD (Spark job definition) activity in ADF
Documentation: Azure Synapse Analytics (Artifact) Linked Service in ADF
Permissions required for running Synapse notebooks and SJDs

We are always open for feedback so please let us know your thoughts in the comments below or add to our Ideas forum.

Updated Jan 28, 2023

Version 2.0

Azure Data Factory

Azure Synapse Analytics

Big Data Analytics

Abhishek Narain

Microsoft

Joined November 14, 2018

View Profile

Azure Data Factory Blog

Follow this blog board to get notified when there's new activity

KevinSlm
Copper Contributor
Mar 07, 2023
Hello Abhishek Narain,

Do you know if the Azure Synapse Analytics (Artifacts) Linked Service will support Managed Private Endpoints anytime soon?

Thank you!
Abhishek Narain
Microsoft
Jan 27, 2023
ryomaru0825 That is correct. Thank you!
We also need to add 'Synapse Artifact User' or 'Synapse Admin'. We will update the blog.
versydney
Iron Contributor
Jan 27, 2023
Hi Abhishek Narain ,

We have a Synapse workspace which has public access disabled (but it does have private endpoints for the SQL pools and for 'Dev'. When I try to create the 'Artifacts' linked service from Data Factory, it gives me this error:
{"code":"PublicNetworkAccessDenied","message":"The public network interface on this Workspace is not accessible. To connect to this Workspace, use the Private Endpoint from inside your virtual network or enable public network access for this workspace."} Processed HTTP request failed.
I'm not a networking expert so not sure where to start with this. I can confirm that enabling public access on the Synapse workspace also enables us to successfully create the linked service in ADF.
Abhishek Narain
Microsoft
Jan 27, 2023
versydney Can you share more details on the error you are facing?
versydney
Iron Contributor
Jan 27, 2023
Thanks ryomaru0825. Yeah, I got excited too soon, but unfortunately could not use it for now. Hoping for a future update!
ryomaru0825
Copper Contributor
Jan 27, 2023
Thank you for this great Update !

I actually tried it and it seems that Computing Operator does not have enough privileges.
Data Factory also requires roles such as Synapse Artifact User, which includes "Microsoft.Synapse/workspaces/artifacts/read".

versydney
In that case, I expected Data Factory to provision a Managed Private Endpoint to Synapse Analytics and be successful, but at this time, the Azure Synapse Analytics (Artifacts) Linked Service did not support Managed Private Endpoints.
The Azure-IR on Data Factory's Managed Virtual Network needs to be updated to connect privately to Synapse Artifacts by configuring the sub-resource with dev.

Managed Private Endpoints are not displayed as shown in the following image:
versydney
Iron Contributor
Jan 26, 2023
Thank you for this! Exactly what we need at this point as we've yet to fully transition into our Synapse workspace.

On a side note, our Synapse workspace does not have "Public network access to workspace endpoints" enabled and is using private links. And so trying to add the Linked Service in ADF results in "PublicNetworkAccessDenied". I'm generally baffled by Azure Private Links in general as I thought there are supposedly no required changes on the client side? Any help is appreciated.

Blog Post

Orchestrate and operationalize Synapse Notebooks and Spark Job Definitions from Azure Data Factory

How to get started with Synapse Notebooks in ADF?

Resources

Share