Update: The feature is available in Microsoft Azure. The availability date for Microsoft Azure Government and Microsoft Azure in China will be announced later.
Analyzing I/O problems just got easier for SQL Server on Azure VMs
It is not easy to understand what's going on when you run into an I/O related performance problem on an Azure Virtual Machine. It is a common, but complex problem. What you need is to figure out what's happening at both the host level and your SQL Server instance where often, correlating host metrics with SQL Server workloads can be a challenge.
We developed a new experience that helps you do exactly that.
When you visit the Storage blade of your SQL virtual machine resource page on Azure portal, you will see two new tabs:
- I/O analysis
- I/O related best practices
The I/O analysis tab will tell you if you are having a performance issue stemming from IOPS and/or throughput throttling, caused by exceeding virtual machine or data disks limits. It will further show you the exact metric(s) and time where this issue shows itself down to the disk or VM level. Once you identify the problem on this tab, you might want to go to our documentation by following the "Learn more" link on the page as shown in the image above. The documentation details each scenario, what might have caused the problem, and provides guidance on how to resolve it.
How does it determine if there is a problem or not? It uses Azure metrics to understand what is going on in the system. I/O analysis checks metric health data for the last 24 hours. You will see the Azure metric charts on the page.
If you click on these charts, they will take you directly to the Azure metrics page as it is simply leveraging what's available in Azure already.
It first looks to see if there is disk latency above a certain threshold. Throttling might occur but it is not considered problematic unless it results in a latency condition. Once latency is detected, it then analyzes Azure Metrics and shows you which one(s) demonstrate the problem.
The Azure metrics are:
- VM Cached IOPS Consumed Percentage
- VM Cached Bandwidth Consumed Percentage
- VM Uncached IOPS Consumed Percentage
- VM Uncached Bandwidth Consumed Percentage
- Data Disk IOPS Consumed Percentage
- Data Disk Bandwidth Consumed Percentage
You can find detailed information about the metrics and the algorithms used in the documentation.
Detecting Latency
In the example below, you see that it detected an issue (disk latency was over the threshold for a certain amount of time). The problem occurred on May 20th at 12:56pm UTC. If there are multiple spikes in the chart, I/O Analysis helps you pinpoint the issue. Two metrics show why the latency occurred.
In this case, it is a throughput problem both at VM level and disk level. The disk related charts have a chart line for each disk you have in your virtual machine, labeled with the LUN number. In the below graphic, you can see there is an I/O latency issue due to throughput for ‘VM Uncached Bandwidth Consumed Percentage’ and ‘Data Disk Bandwidth Consumed Percentage’.
You can then explore the details further by expanding the VM level metrics and / or disk level metrics sections below the Disk Latency chart.
For this scenario, expanding the VM level metrics reveals the following data on cached and uncached IOPS and throughput health, where you again can go to the metrics data, for further analysis, by clicking the chart.
You may also want to explore the disk level, by expanding the disk level metrics section as shown below.
I/O related best practices tab checks to make sure your system is following the configuration best practices relating to I/O on a SQL Server on Azure VM. Poorly configured systems tend to lead to performance problems, which often get exposed under workload pressure. Running an assessment will give you recommendations with various severity ratings based on risk and impact.
We recommend you implement them starting with the highest severity.
PowerShell script
If you prefer scripting to using Azure portal, you can also use the I/O Analysis PowerShell script to analyze the I/O performance of your SQL Server VM.
We would love to hear your feedback. Please feel free to comment here.