Synapse Spark mssparkutils run multiple and log analytics

Copper Contributor

Hi all,

 

Context:

We have developed a solution using Synapse Notebooks as Fabric hasn't been approved for production yet. 

The initial approach was to orchestrate each notebook using data pipelines, but it would take c.2-3min to spin up the spark session on each one (46 Notebooks). The reason we wanted to use this approach is to capture the logs with log analytics for each one of the notebooks. The whole solution was taking an average of 1 h 40 min.  Not ideal when the expectation is to refresh the data every 2 hours.

Reviewing Microsoft documentation,Introduction to Microsoft Spark utilities - Azure Synapse Analytics | Microsoft Learn, one suggested way to reduce runtime was to use the mssparkutils.notebook.runtime() function, allowing us to run multiple notebooks within the same spark session and share computing resources. 

With this new approach, we designed a DAG that reduced our execution to 45 min, more aligned with the initial expectation of refresh scheduling we had in mind.

 

Problem:

With the new implementation, we orchestrate the execution of the notebooks using mssparkutils in one pipeline with one notebook and a trigger. This means that we have lost the ability to monitor individual notebooks with log analytics, as it only monitors the main pipeline/spark session/notebook and not all the executions within it.

Has someone faced a similar issue? Is there a way to send to log analytics information regarding each notebook that is running inside the DAG in runmultiple?

We want to monitor start time, end time, and status (queue, in progress, succeeded, failed) and capture errors if they occur.

 

Thank you.

1 Reply
Let's see if someone has faced it Victor.