Maximizing the performance and scalability of your Stream Analytics job requires optimizing your query logic. One tool that can help you achieve this is the Job Simulation feature within the ASA extension for Visual Studio Code. It allows you to:
- Visualize the topology of your Stream Analytics job and assess its parallelism.
- Gain a clear understanding of how the computing resources (streaming units) are allocated within your job.
- Obtain valuable insights into the execution of your job at both the node and processor levels.
- Receive helpful editing suggestions to enhance your query and achieve a parallel job.
In this blog, we'll walk you through a real-world example of a Stream Analytics job called RETAIL_INVENTORY, demonstrating how to utilize the Job Simulation to optimize the job's efficiency. The RETAIL_INVENTORY job ingests data streams from an Event Hub with 8 partitions and uses 12 SUs to process data.
Step 1: Simulate job and analyze the job topology
After exporting your job to VSCode, open the query script and select Simulate job in the CodeLens to open job simulation. The job simulation shows that the RETAIL_INVENTORY job has 2 streaming nodes (one streaming node equals six SUs) and it’s not a parallel job. The streamingNode0 processes 5 partitions from Event Hub and streamingNode1 processes 3 partitions.
Step 2: Observe the processor-level diagram
Select streamingNode0 to expand and view the processor-level diagram. The diagram shows how the data in each partition is being processed by the Input, Computing, and Marshaller processors. The Marshaller processors come into action when there are cross-streaming node operations, such as data aggregation.
To locate the query steps, double-click on the Marshaller processors on the diagram. This action will highlight the corresponding query in the editor, allowing you to pinpoint the exact logic executed. From the query, we can see that the RETAIL_INVENTORY job uses a MAX() function along with a sliding window to determine the maximum temperature of the devices.
To learn about the function of different processor types, see Stream Analytics job diagram (preview) in Azure portal - Azure Stream Analytics | Microsoft Learn.
Step 3: Edit the query based on Enhancements
Select the Enhancements option to find out why the job is not parallel and receive suggestions on how to enhance the query. In this example, the suggestion advises partitioning the query using the “PartitionId” partition key for each query step.
After partitioning the query using “PartitionId”, the job is now executing in parallel with 12 streaming units. The number of partitions is aligned between the input, query, and output.
In conclusion, the Job Simulation in VSCode extension is a powerful tool that can help you optimize your Stream Analytics job for performance and scalability. With this tool, you can gain insight into how your job processes data at the processor level and how computing resources are allocated. This ensures that your job is operating at its peak performance and can help you avoid any potential performance issues in the future.