Ensure your Stream Analytics query runs at peak performance using Job Simulation
Published May 23 2023 04:47 PM 2,242 Views
Microsoft

Maximizing the performance and scalability of your Stream Analytics job requires optimizing your query logic. One tool that can help you achieve this is the Job Simulation feature within the ASA extension for Visual Studio Code. It allows you to:  

  • Visualize the topology of your Stream Analytics job and assess its parallelism. 
  • Gain a clear understanding of how the computing resources (streaming units) are allocated within your job. 
  • Obtain valuable insights into the execution of your job at both the node and processor levels. 
  • Receive helpful editing suggestions to enhance your query and achieve a parallel job.  

In this blog, we'll walk you through a real-world example of a Stream Analytics job called RETAIL_INVENTORY, demonstrating how to utilize the Job Simulation to optimize the job's efficiency. The RETAIL_INVENTORY job ingests data streams from an Event Hub with 8 partitions and uses 12 SUs to process data. 

 

Step 1: Simulate job and analyze the job topology 

After exporting your job to VSCode, open the query script and select Simulate job in the CodeLens to open job simulation. The job simulation shows that the RETAIL_INVENTORY job has 2 streaming nodes (one streaming node equals six SUs) and it’s not a parallel job. The streamingNode0 processes 5 partitions from Event Hub and streamingNode1 processes 3 partitions.

alexlzx_5-1684884807345.png

 

Step 2: Observe the processor-level diagram 

Select streamingNode0 to expand and view the processor-level diagram. The diagram shows how the data in each partition is being processed by the Input, Computing, and Marshaller processors. The Marshaller processors come into action when there are cross-streaming node operations, such as data aggregation.

alexlzx_1-1684884630487.png

 

To locate the query steps, double-click on the Marshaller processors on the diagram. This action will highlight the corresponding query in the editor, allowing you to pinpoint the exact logic executed. From the query, we can see that the RETAIL_INVENTORY job uses a MAX() function along with a sliding window to determine the maximum temperature of the devices. 

alexlzx_2-1684884630491.png

To learn about the function of different processor types, see Stream Analytics job diagram (preview) in Azure portal - Azure Stream Analytics | Microsoft Learn. 

 

Step 3: Edit the query based on Enhancements

Select the Enhancements option to find out why the job is not parallel and receive suggestions on how to enhance the query. In this example, the suggestion advises partitioning the query using the “PartitionId” partition key for each query step.  

alexlzx_3-1684884630496.png

 

After partitioning the query using “PartitionId”, the job is now executing in parallel with 12 streaming units. The number of partitions is aligned between the input, query, and output.  

alexlzx_4-1684884630499.png

 

In conclusion, the Job Simulation in VSCode extension is a powerful tool that can help you optimize your Stream Analytics job for performance and scalability. With this tool, you can gain insight into how your job processes data at the processor level and how computing resources are allocated. This ensures that your job is operating at its peak performance and can help you avoid any potential performance issues in the future. 

Co-Authors
Version history
Last update:
‎May 23 2023 04:47 PM
Updated by: