Hidden Treasure Part 1: Additional Performance Insights in DISKSPD XML

Former Employee

Mar 18, 2021

Written by Jason Yi, PM on the Azure Edge & Platform team at Microsoft.

Acknowledgements: Dan Lovinger

Imagine this, you have an Azure Stack HCI cluster set up and ready to go. But you have that lingering question: What is your cluster’s storage performance potential? In such cases, you can rely on micro-benchmarking tools such as DiskSpd. And if you are not aware, the tool helps you customize and configure your own synthetic workloads by tweaking built in parameters. For more information, you can read about it here.

“Visible” and Clean Data

Most folks who already have experience with DiskSpd are likely familiar with the txt output option, which is also displayed in the terminal. The purpose behind this output was to present the data in a human readable format. We also aggregated some of the finer details to generate practical metrics for the users. This also means that we determined which metrics would be considered valuable. But, did you know that there is an option to output in XML, which reveals additional, granular data such as the total IOs achieved per second.

Let’s first take a few moments to review the txt output. As you may know, this output is split into four different sections:

Input settings:

CPU utilization details:

Total IO performance metrics:

Latency percentile analysis (-L parameter):

This result produces a detailed view of a couple performance metrics. That’s great, but what if you are interested in other data insights? If you did not read carefully through the DiskSpd wiki page, you may have missed the fact that there is a “hidden feature.” There is another output format that generates an XML file. This can be invoked by the -Rxml parameter and piped into an XML file with your preferred file name. But wait, there’s more! If you peep into the XML file, you will notice that there is more data than what was originally shown in the txt output, such as the total IOs achieved per second. More specifically, the XML output reveals more granular data as opposed to the aggregated data for the human eyes. If you wish to take a look, be warned – your eyes will burn from the squinting.

Table of Contents: XML

Before your eyes burn, let’s create a brief table of contents for the XML file.

<System> Under this element, you have some basic information regarding the system itself, such as the server/VM name, DiskSpd version, number of processors, etc.

<Profile> Under this element, you will find your input parameters from when you ran DiskSpd. To name a few, this includes the queue depth, thread count, warm up time, test duration, etc. There are quite a few sub-elements within this section. Luckily, most of them are self-explanatory, and so let us focus on a few of them.

<TimeSpans> Under this element, you will find <TimeSpan> elements. Each of those <TimeSpan> elements represent one DiskSpd test run. As you may have guessed, the content within <TimeSpan> contains a set of parameters that you, the user, specifies. For example, you can see that the <requestcount> element is set to 32 since we initially set the queue depth to be 32 when we ran DiskSpd. You can think of this section as being analogous to the “input settings” result in the txt output.

<TimeSpan> This element is not to be confused with the above <TimeSpan> element. This section contains the results of your DiskSpd test. It is similar to the data presented in the txt file, but with added granular data. More specifically, you can view the CPU usage, IOPS statistics and latency statistics (average total milliseconds, standard deviation, etc.), in their respective sub-elements:

<CpuUtilization>
- The CPU data is broken down per core.
<Latency>
- The latency data is broken down into separate “buckets” where each bucket corresponds to 1 percentile rank, in ascending order from 0 to 100%.
<Iops>
- The IOPS data is broken down into separate “buckets” where each bucket corresponds to the IO data for 1 millisecond.

This may give rise to the question; can you modify the contents of this XML file and pipe it back into DiskSpd? Yes, you absolutely can! In fact, there is another parameter precisely for this purpose (-X). Here are the following steps to get you started: (great for batch testing!)

Before using this parameter (-X), you will need to preserve the contents within the <Profile> element. Any other data that exists in the XML file may be discarded. If you plan to run the DiskSpd test with modified input parameters, be sure to make the appropriate changes in the <Profile> section.
Optional: If you plan to run multiple DiskSpd tests, you can add more <TimeSpan> elements under <Profile>, with your desired input parameters.
You can then run DiskSpd with the -X parameter which will take the XML file path as input and output a new XML (or txt) file with the newly generated result.

Bonus: Script to Extract IOPS

In case you wanted to start somewhere, I’ve included a short script that takes in a DiskSpd XML output named “output.xml” and extracts the total IOs achieved per second into a neat CSV file for you to view (ensure they are in the same path). This might be a good place to start if you want to get more data insights about IOPS. **Foreshadowing**

Final Remarks

Hopefully, this provides a solution for those situations where you always wanted a more detailed form of data or to run DiskSpd batch tests. You can also imagine that there are a variety of ways you can manipulate the XML output through PowerShell scripts. Alas, this is for another day.

Script Below

# Written by Jason Yi, PM

# This script takes the output XML file from DISKSPD and extracts the IOPS and time (seconds) and neatly organizes it into a CSV file. 
# Ensure that your XML output file is in the same directory as this script when running. 

# create path, input file, and node variables
$path = Get-Location
$file = [xml] (Get-Content "$path\output.xml")
$nodelist = $file.SelectNodes("/Results/TimeSpan/Iops/Bucket")
$ms = $nodelist.getAttribute("SampleMillisecond")

# store the bucket objects into a variable
$buckets = $file.Results.TimeSpan.Iops.Bucket

# update the xml from milliseconds to seconds 
for ($i = 0; $i -lt $buckets.length; $i++){
    $temp = $buckets[$i].SampleMillisecond
    $tempUpdate = [int]($temp)/1000
    $buckets[$i].SampleMillisecond = "$tempUpdate"
}


# select the objects you want in the csv file
$nodelist | 
    Select-Object @{n='Time (s)';e={[int]$_.SampleMillisecond}},
                  @{n='Total IOPS';e={[int]$_.Total}} |
    Export-Csv "$path\time_v_iops.csv" -NoTypeInformation -Encoding UTF8 -Force # Have to force encoding to be UTF8 or data is in one column (UCS-2)

Updated Apr 15, 2021

Version 2.0

Azure Stack HCI

yijason

Former Employee

Joined December 04, 2020

View Profile

Azure Stack Blog

Follow this blog board to get notified when there's new activity