Explore the Monitor Hub in Synapse Studio to keep track of all activities in your Synapse workspace
Published Dec 15 2020 08:55 AM 6,585 Views
Microsoft

Mission Control: Monitor Current and Historical Activities for SQL, Apache Spark, and Pipelines.

The Monitor hub in Azure Synapse Analytics helps keep track of all activities in your Synapse workspace, including the active ones. In this post, we will investigate a Synapse pipeline’s lifecycle, investigate SQL requests that ran in the workspace, and finally monitor a spark application’s progress.

 

Using a Sample Pipeline from Azure Synapse Knowledge Center

 

To get our hands on a pipeline, we will use the Azure Synapse Knowledge Center where we can find various pipeline templates. Inside the Synapse workspace, choose the Integrate option from the left menu to open the Integrate Hub.

saveenr_0-1607996284032.png

Integrate Hub is open. The plus button to add new artifacts is selected. Browse Gallery from the list of resource options is highlighted.

 

Once in the gallery, make sure Pipelines page is selected. From the list of available sample pipelines, choose Delete files older than 30 days and select Continue.

saveenr_1-1607996284061.png

Knowledge Center pipeline gallery is open. Delete files older than 30 days pipeline is selected. Continue button is highlighted.

 

On the next screen, you can see the pipeline’s details. We will have to pick a linked service for BinaryDatasetForDeleteActivity and DeleteFiles. BinaryDatasetForDeleteActivity is where the source files will be. The pipeline will look into this location for files older than 30 days and delete older files. DeleteFiles is the location where the pipeline will store its log files. Once both values are set, select Open pipeline to continue. In our case, we picked the default Azure Data Lake Storage Gen2 of our Synapse workspace.

saveenr_2-1607996284076.png

Delete files older than 30 days pipeline creation window is up. BinaryDatasetForDeleteActivity and DeleteFiles are set to the default workspace storage of the Synapse workspace. Open pipeline button is highlighted.

 

We are ready to publish our pipeline and save it as part of our Synapse workspace. Select Publish all to create the pipeline and the required datasets.

saveenr_3-1607996284082.png

Delete files older than 30 days pipeline is open. Publish all button is highlighted.

 

Monitoring Pipelines in Azure Synapse Analytics

 

It’s time to run our pipeline for the first time. Select Add trigger and hit Trigger now to execute the pipeline.

saveenr_4-1607996284088.png

Delete files older than 30 days pipeline is open. Add trigger is selected, and Trigger now command is highlighted.

 

The pipeline will ask for some input parameters. For now, leave those as they are and select OK to continue.

saveenr_5-1607996284091.png

Pipeline run windows shows a list of pipeline parameters with their default values. The OK button is highlighted.

 

Switch to the Monitor hub by selecting the Monitor section from the left menu in your Synapse workspace. Select Pipeline runs to open a page where you can see a list of past and current pipeline activities. As you can see, our pipeline has failed. Clicking on the parameters link shows us the parameter values we passed in.

saveenr_6-1607996284104.png

Monitor Hub is open. Pipeline runs section is selected. The latest pipeline run for Delete files older than 30 days pipeline is shown to be failed. Pipeline’s parameters collection is presented.

 

The SourceDirectory parameter we set previously refers to a folder named subfolder that does not exist in our Azure Data Lake Storage Gen2. Moreover, SourceFolder parameter’s value does not map to a file system either. Let’s fix those errors now.

Switch to the Data hub and navigate into the Azure Data Lake Storage Gen2 you assigned to your pipeline. Open the default file system, in our case default-fs, and select New Folder from the top menu to create a new folder. Name it sample-directory. Select Continue to create the folder.

saveenr_7-1607996284115.png

Data hub is open. Workspace’s default ADLS Gen2 location is selected, and its default-fs file system is open. The new folder command from the top menu is selected. The new folder name is set to sample-directory, and the Create button is highlighted.

 

Go back to your pipeline and trigger it again. This time we will set SourceFolder to default-fs, SourceDirectory to sample-directory, and leave the LoggingPath with its default value.

saveenr_8-1607996284118.png

Pipeline run parameters are set. The OK button is highlighted.

 

Going back to the Monitor hub, we can see that the current execution of the pipeline did succeed because we fixed it.

saveenr_9-1607996284132.png

Pipeline run parameters are set. The OK button is highlighted.

 

Using SQL Scripts from Azure Synapse Knowledge Center

 

In order to test out SQL Script Monitoring in Azure Synapse we need some SQL Scripts. We can get some good ones from Azure Synapse Knowledge Center. Inside the Synapse workspace, choose the Develop option from the left menu to open the Develop Hub. Select “+” Add New Resource command and Browse gallery to navigate to the gallery.

saveenr_10-1607996284136.png

Develop Hub is open. The plus button to add new artifacts is selected. Browse Gallery from the list of resource options is highlighted.

 

Once in the gallery, make sure SQL Scripts page is selected. From the list of available sample scripts, choose Query CSV files and select Continue.

saveenr_11-1607996284155.png

Knowledge Center SQL Script gallery is open. Query CSV files script is selected. Continue button is highlighted.

 

You can preview the script and select Open script once you are ready to move back to the Develop hub.

saveenr_12-1607996284173.png

Query CSV files script is displayed. Open script button is highlighted.

 

Monitor SQL Request Activity in Synapse Analytics

 

If you want to save the script to your workspace, you can select Publish All. You can run the script by choosing Run from the top menu without publishing the script.

saveenr_13-1607996284187.png

Query CSV files script is displayed. Publish all button and Run command are highlighted.

 

Now, let’s switch to the Monitor hub and select SQL requests from the Activities section. This will show you all the SQL script activity in your workspace. We can see the script’s duration, the submitter, and the data processed while running the script.

saveenr_14-1607996284203.png

Monitor hub is open. SQL requests section is selected. The latest SQL query is highlighted.

 

If you select the Request Content you can see the full query.

saveenr_15-1607996284207.png

SQL Script body is displayed. The Close button is highlighted.

 

Using Notebooks from Azure Synapse Knowledge Center

 

In order to test out Apache Spark Application Monitoring in Azure Synapse we will grab a notebook from Azure Synapse Knowledge Center. Inside the Synapse workspace, choose the Develop option from the left menu to open the Develop Hub. Select “+” Add New Resource command and Browse gallery to navigate to the gallery.

saveenr_16-1607996284213.png

Develop Hub is open. The plus button to add new artifacts is selected. Browse Gallery from the list of resource options is highlighted.

 

Once in the gallery, make sure Notebooks page is selected. From the list of available sample notebooks, choose Getting Started with Delta Lake and select Continue.

saveenr_17-1607996284226.png

Knowledge Center Notebook gallery is open. Getting Started with Delta Lake notebook is selected. Continue button is highlighted.

 

You can preview the notebook and select Open notebook once you are ready to move back to the Develop hub.

saveenr_18-1607996284236.png

Getting Started with Delta Lake notebook is displayed. Open notebook button is highlighted.

 

When you are back in the Develop hub a pop-up dialog will suggest you to create an Apache Spark pool if you do not have one already. Select Create pool to continue.

saveenr_19-1607996284284.png

No spark pool available dialog is shown. Create pool button is highlighted.

 

The provisioning of your new Apache Spark pool can take a minute. You can select Show Notifications from the top menu and monitor the deployment status.

saveenr_20-1607996284297.png

Azure Portal Notifications are open. Deployment status for the Spark pool is highlighted.

 

Once the deployment is complete, you can select Run all to run all the cells in the current notebook in sequence. The first time you run the notebook, a new Spark session will be created. With its default settings, the session will time out in 30 minutes if you do not run any queries. If you want to save the notebook to your Synapse workspace, make sure you select Publish all.

saveenr_21-1607996284312.png

Getting Started with Delta Lake notebook is displayed. Publish all and Run all commands are highlighted.

 

Monitoring Spark Applications in Synapse Analytics

 

Once you run a notebook, you can navigate to the Monitor Hub and select Apache Spark applications section to see a list of activities. Focus on the latest run and select the Application name to navigate into the details page of the session.

saveenr_22-1607996284322.png

Monitor Hub is open. Apache Spark applications section is selected. The latest run of the Getting Started with Delta Lake notebook is highlighted with the Submitting status.

 

From the details page, you can select Spark UI to navigate to Apache Spark.

saveenr_23-1607996284345.png

Details for the Getting Started with Delta Lake notebook is displayed. Spark UI button is highlighted.

 

Here you can see the Spark application user interface.

saveenr_24-1607996284361.png

Spark UI is displayed.

 

Keeping Things Clean and Tidy

 

While we had fun monitoring pipelines, SQL Scripts, and notebooks, we created a list of artifacts in our Synapse workspace. Let’s do some cleaning before we leave.

 

From the Integrate hub, we will delete the pipeline we created. Open the Actions menu for Delete files older than 30 days pipeline and select Delete. Make sure you hit Publish All to publish the delete operation.

saveenr_25-1607996284367.png

Integrate Hub is open. Actions menu for Delete files older than 30 days pipeline is selected. Delete command is highlighted.

 

From the Develop hub, we will delete the SQL Script and the Notebook we created. Open the Actions menu for Query CSLV files and select Delete. Do the same for Getting Started with Delta Lake notebook as well. Make sure you hit Publish All to publish the delete operations.

saveenr_26-1607996284372.png

Develop Hub is open. Actions menu for Getting Started with Delta Lake notebook is selected. Delete command is highlighted.

 

From the Data hub, we will delete the dataset we created. Open the Actions menu for BinaryDatasetForDeleteActivity and select Delete. Make sure you hit Publish All to publish the delete operations.

saveenr_27-1607996284379.png

Data Hub is open. Actions menu for BinaryDatasetForDeleteActivity dataset is selected. Delete command is highlighted.

 

Navigate to the Manage Hub and list the linked services. From the list select Delete for bing-covid-19-data.

saveenr_28-1607996284387.png

Manage Hub is open. The linked services section is selected. Delete command for bing-covid-19-data is highlighted.

 

Within the Manage hub, switch to Apache Spark pools list. Select More for the spark pool you created and hit Delete to remove it.

saveenr_29-1607996284398.png

Manage Hub is open. Apache spark pools section is selected. Delete command for SampleSpark is highlighted.

 

With that, we deleted all we created so far.

 

Conclusion

 

Whatever happens in Synapse can be monitored in the Monitor Hub. During our short journey, we ran and monitored a pipeline, SQL scripts and an Apache Spark notebook. Having the knowledge center with sample data sets, pipelines and notebooks definitely make a test run easy.

 

Go ahead, try out this tutorial yourself today by creating an Azure Synapse workspace.

saveenr_30-1607996284408.jpeg

 

 

Version history
Last update:
‎Dec 15 2020 09:49 AM
Updated by: