Hidden Treasure Part 2: Mining Additional Insights

Published Apr 15 2021 10:11 AM 1,623 Views
Microsoft

Written by Jason Yi, PM on the Azure Edge & Platform team at Microsoft. 

Acknowledgements: Dan Lovinger, Principal Software Engineer

 

On the last episode of discovering hidden treasure, we took a closer look at what type of data lies within the DiskSpd XML output. Today, we will examine an example of how to take advantage of that data and create new and practical insights.

 

DiskSpd on Azure

Let’s say that we are using Azure VMs to simulate some workload using DiskSpd. To visualize the data, let’s go ahead and use a short script that takes the XML output and extracts the total IOs per bucket into a CSV file for a more graphical view.

Picture1.png

 

As you can see, the IOPS are relatively constant, with an occasional bump. The reason is because we are maxing out the total number of IOPS on our Azure environment (3-node cluster using Standard B2ms) can handle. Azure also artificially throttles the IOPS limit based on your VM size and drive type. In our case, the VM limit is 1920 IOPS and you can see that our peak is ~1950 IOPS. The occasional spike and drop in IOPS is likely due to Azure attempting to rebalance itself and locate the throttle limit.

Using Azure VMs, we can see that the IOPS values are relatively constant, but that’s not very interesting nor is it representative of a real workload. The workloads in the real world are much messier and random. Perhaps there is a way to replicate random IO activity to represent a typical day to day activity. Well, you are in luck, because there is a script for that - Let’s try it!

 

Randomize IOPS experiment

Note: The IOPS variance is purely artificial and for educational purposes only. By no means does this replicate any real-world IO scenario.

 

To help demonstrate this experiment, I’ve written a short script called “iops_randomizer.ps1”, to simulate random IO activity. The script uses a set of parameters to run DiskSpd in short, one second bursts. The IO values are randomized each second by using the (-g) parameter to throttle the throughput, which in turn affects the IOPS limit. Here are the parameters for the script:

  • -d (mandatory) = The number of DiskSpd tests. Because each test run corresponds to one second, you can think of this as the total duration of the script.
  • -path (mandatory) = the path to the test file.
  • -rw_flag = Takes in one of two options, zero or one. 0 represents that the user wants to input their custom read/write ratio whereas 1 represents that the user wants a randomized read/write ratio, without providing the -w parameter value. The default selection will be 0 and if the user does not provide a complementary -w parameter value, the script will use a default value of -w 0 (100% read).
  • -g_min = The minimum value possible when randomizing the throughput (defines the min range). The default value is 0 bytes per milliseconds.
  • -g_max = The maximum value possible when randomizing the throughput (defines the max range). The default value is 8000 bytes per milliseconds.
  • -b = The block size in bytes. The default is 4096 bytes (4KiB).
  • -r = The random I/O aligned to the specified size in bytes. The default is 4096 bytes (4KiB).
  • -o = The outstanding IO requests per target per thread. The default is 32.
  • -t = The number of threads per target file. The default is 4.
  • -w = The percentage of operations that are write requests. The default is 0% writes, 100% reads.

 

Note: You may find that your IOPS values are ridiculously small. This is because the default parameters are not optimized to your powerful environment. Consequently, you may need to experiment with the (-g) parameter range. Remember that because they are in bytes per milliseconds, you will need to perform some unit conversion to confirm that you are efficiently randomizing your values.

 

Here is the conversion I used:

Picture2.png

 

Let’s now try running the following script:

Picture3.PNG

 

After about 120 seconds, you should see 3 files in your current directory.

  • expand_profile.xml : This file is created when the script is first run and contains all the DiskSpd test runs with their respective parameters. This is later fed into DiskSpd as an input. As a result, the file only contains the <Profile> element. You may use this file to modify any parameters you desire and feed it back into DiskSpd.
  • output.xml : This is the finalized output file that is created after the DiskSpd test is complete.
  • iops_stat_seconds.csv : This file contains the clean data for the number of IOs for each second the DiskSpd test was run.

Now that we have the csv output, we can create a graph that plots total IO vs time (seconds). We now have some variance in the number of IOs!

Picture4.png

 

IO Percentiles

As you’ve just seen, there is potential in experimenting with the xml output. Perhaps you wish to derive other data that may be valuable for your situation. For example, maybe we want to examine the percentile values of the IO operations. Let’s actually try it, we have a second script called “get_iops_percentile.ps1” that takes the iops_stat_seconds.csv file and calculates the percentile scores for the IO values. After running the script, you should see a file called iops_percentiles.csv as well as a copy of the output on the PowerShell terminal.

Picture5.png

 

These percentile values can help us understand the different segmentations of IO values, gauge the average IO output for each second, and identify trends. In our example, we can see that 99% of the IOPS are less than ~1635.

 

Bonus: rw_flag

This section is to provide more information on the rw_flag to clear up any potential confusion. You may be wondering what is the difference between using 0 and 1?

 

The main difference is that with an rw_flag of 0, you the user, can provide an additional write to read ratio parameter (-w) value. For example, if you provide 30, this means 30% of the IO will be writes and 70% of the IO will be reads. This also means that every DiskSpd test will use 30 as the write to read ratio, producing a consistent result between read IOs and write IOs in the long run.

 

However, with an rw_flag of 1, the user does not need to specify any read/write ratio. Instead, the ratio is randomized each second between 0% and 100%.

 

Using the performance monitor within Windows Admin Center, the result may look something like this: (left side uses rw_flag=0, right side uses rw_flag=1)

Picture6.png

Final remarks

Today’s experiment was one example of extrapolating new data from the XML output. If you believe DiskSpd is not giving you a specific metric and wish to infer other data, this may be one method of manually discovering new “treasures.” Have fun!

 

*Script 1: iops_randomizer*

# Written by Jason Yi, PM

<#
.PARAMETER d
integer number of diskspd runs (can consider it as duration since each run is one second long)
.PARAMETER path
the path to the test file
.PARAMETER rw_flag
the default is 0. 0 represents that the user wants to input their custom read/write ratio whereas 1 represents that the user wants a randomized read/write ratio
.PARAMETER g_min
the minimum g parameter (g parameter is the throughput threshold)
.PARAMETER g_max
the maximum g parameter (g parameter is the throughput threshold)
.PARAMETER b
the block size in bytes
.PARAMETER r
random IO aligned to specified size in bytes
.PARAMETER o
the queue depth
.PARAMETER t
the number of threads
.PARAMETER w
the ratio of write tests to read tests
#>
Param (
[Parameter(Position=0,mandatory=$true)][int]$d,
[Parameter(Position=2,mandatory=$true)][string]$path, # C:\ClusterStorage\CSV01\IO.dat
[int]$rw_flag = 0,
[int]$g_min = 0,
[int]$g_max = 8000,
[int]$b = 4096,
[int]$r = 4096,
[int]$o = 32,
[int]$t = 4,
[int]$w = 0)

Function Create-Timespans{
<#
.DESCRIPTION
This function takes the input number of diskspd runs (or duration) and lasts for that input number of seconds while randomizing
the throughput threshold within a specified range. Includes same parameters initially passed in by user.
#>
Param (
[int]$d,
[string]$path,
[int]$g_min,
[int]$g_max,
[int]$b,
[int]$r,
[int]$o,
[int]$t,
[int]$w,
[int]$rw_flag
)



[xml]$xml=@"
<Profile>
<Progress>0</Progress>
<ResultFormat>xml</ResultFormat>
<Verbose>false</Verbose>
<TimeSpans>
<TimeSpan>
<CompletionRoutines>false</CompletionRoutines>
<MeasureLatency>true</MeasureLatency>
<CalculateIopsStdDev>true</CalculateIopsStdDev>
<DisableAffinity>false</DisableAffinity>
<Duration>1</Duration>
<Warmup>0</Warmup>
<Cooldown>0</Cooldown>
<ThreadCount>0</ThreadCount>
<RequestCount>0</RequestCount>
<IoBucketDuration>1000</IoBucketDuration>
<RandSeed>0</RandSeed>
<Targets>
<Target>
<Path>$path</Path>
<BlockSize>$b</BlockSize>
<BaseFileOffset>0</BaseFileOffset>
<SequentialScan>false</SequentialScan>
<RandomAccess>false</RandomAccess>
<TemporaryFile>false</TemporaryFile>
<UseLargePages>false</UseLargePages>
<DisableOSCache>true</DisableOSCache>
<WriteThrough>true</WriteThrough>
<WriteBufferContent>
<Pattern>sequential</Pattern>
</WriteBufferContent>
<ParallelAsyncIO>false</ParallelAsyncIO>
<FileSize>1073741824</FileSize>
<Random>$r</Random>
<ThreadStride>0</ThreadStride>
<MaxFileSize>0</MaxFileSize>
<RequestCount>$o</RequestCount>
<WriteRatio>$w</WriteRatio>
<Throughput>0</Throughput>
<ThreadsPerFile>$t</ThreadsPerFile>
<IOPriority>3</IOPriority>
<Weight>1</Weight>
</Target>
</Targets>
</TimeSpan>
</TimeSpans>
</Profile>
"@


# 1 flag means that the user wishes to randomize the rw ratio
# 0 flag means that the user wishes to control the rw ratio
# Basically, throw an error when the flag is no 0 or 1
if ( ($rw_flag -ne 1) -and ($rw_flag -ne 0) ){
throw "Invalid rw_flag value. Please choose 0 to provide your own rw ratio, or 1 to randomize the rw ratio.
"
}

$path = Get-Location
# loop up until the number of runs (duration) and add new timespan elements
for($i = 1; $i -lt $d; $i++){

$g_param = Get-Random -Minimum $g_min -Maximum $g_max
$true_w = Get-Random -Minimum 0 -Maximum 100

# if there is only one timespan, add another
if ($xml.Profile.Timespans.ChildNodes.Count -eq 1){

# clone the current timespan element, modify it, and append it as a child
$new_t = $xml.Profile.Timespans.Timespan.Clone()
$new_t.Targets.Target.Throughput = "$g_param"
if ($rw_flag -eq 1){
$new_t.Targets.Target.WriteRatio = "$true_w"
}
$null = $xml.Profile.Timespans.AppendChild($new_t)

}
else{

# clone the current timespan element, modify it, and append it as a child
$new_t = $xml.Profile.Timespans.Timespan[1].Clone()
$new_t.Targets.Target.Throughput = "$g_param"
if ($rw_flag -eq 1){
$new_t.Targets.Target.WriteRatio = "$true_w"
}
$null = $xml.Profile.Timespans.AppendChild($new_t)

}
}

# show updated result
$xml.Profile.Timespans.Timespan
# save into xml file
$xml.Save("$path\expand_profile.xml")

}
#########################
##### SCRIPT BEGINS #####
#########################


# create the xml file with diskspd parameters
Create-Timespans -d $d -g_min $g_min -g_max $g_max -path $path -b $b -r $r -o $o -t $t -w $w -rw_flag $rw_flag


# create path, input file, and node variables
$path = Get-Location
# feed profile xml to DISKSPD with -X parameter (Running DISKSPD)
Invoke-Expression ".\diskspd.exe -X'$path\expand_profile.xml' > output.xml"

$file = [xml] (Get-Content "$path\output.xml")


$nodelist = $file.SelectNodes("/Results/TimeSpan/Iops/Bucket")
$ms = $nodelist.getAttribute("SampleMillisecond")

# store the bucket objects into a variable
$buckets = $file.Results.TimeSpan.Iops.Bucket

# change the millisecond values to seconds
$time_arr = 1..$d
foreach ($t in $time_arr){
$buckets[$t-1].SampleMillisecond = "$t"
}

# select the objects you want in the csv file
$nodelist |
Select-Object @{n='Time (s)';e={[int]$_.SampleMillisecond}},
@{n='Total IOs';e={[int]$_.Total}} |
Export-Csv "$path\iops_stat_seconds.csv" -NoTypeInformation -Encoding UTF8 -Force # Have to force encoding to be UTF8 or data is in one column (UCS-2)

# import modified csv once more
$fileContent = Import-csv "$path\iops_stat_seconds.csv"

# if duration is less than 7 (number of percentile ranks), then add empty rows to fill that gap
if ($d -lt 7 ) {
for($i=$d; $i -lt 7; $i++) {
# add new row of values that are empty
$newRow = New-Object PsObject -Property @{ "Time (s)" = '' }
$fileContent += $newRow
}
}

# show output in the terminal
$fileContent | Format-Table -AutoSize

# export to a final csv file
$fileContent | Export-Csv "$path\iops_stat_seconds.csv" -NoTypeInformation -Encoding UTF8 -Force

 

*Script 2: get_iops_percentiles*

# Written by Jason Yi, PM

Function Get-IopsPercentiles{
<#
.DESCRIPTION
This function expects an array of sorted iops, length of the iops array, and an array of percentiles. For the given array of percentiles,
it returns the calculated percentile value for the set of iops numbers.

.PARAMETER sort_iops
array of sorted iops values from the input file
.PARAMETER iops_len
length of the sort_iops array
.PARAMETER percentiles
array of the percentiles you wish to find
#>
Param (
[array]$sort_iops,
[int]$iops_len,
[array]$percentiles)

$new_iops = New-Object System.Collections.ArrayList($null)
# loop through the percentiles array
foreach ($k in $percentiles) {

[Double]$num = ($iops_len - 1) * $k + 1

# if num is equal to 1 then add the first element to array
if ($num -eq 1) {

[void]$new_iops.Add( $sort_iops[0])
}

# if num is equal to the length of array then add the last element to array
elseif ($num -eq $iops_len) {
[void]$new_iops.Add( $sort_iops[$iops_len-1])
}

else {
$val = [Math]::Floor($Num)

#get decimal portion of the num
[Double]$dec = $num - $val

[void]$new_iops.Add( $sort_iops[$val - 1] + $dec * ($sort_iops[$val] - $sort_iops[$val - 1]))
}

}
return $new_iops

}


# Set path and import the csv file
$path = Get-Location
$file = Import-Csv "$path\iops_stat_seconds.csv"

#$sort_iops = $file."Total IOPS" | Sort-Object -Property {$_ -as [decimal]}


# sort the values in IOPS column in ascending order
$sort_iops = [decimal[]] $file."Total IOs"
[Array]::Sort($sort_iops)

# remove the empty or 0 values
$sort_iops = @($sort_iops) -ne '0'

$iops_len = $sort_iops.Length
#$percentiles = (1,25,50,75,90,95,99)
$percentiles = (.01,.25,.50,.75,.90,.95,.99)

# find the calculated percentiles and put them in an array
$new_iops = Get-IopsPercentiles $sort_iops $iops_len $percentiles

# if the old iops length is less than the length of the new calculated iops scores, then that new length is the iops_len
$new_iops_len = $new_iops.Length
if($iops_len -le $new_iops_len){
$iops_len = $new_iops_len
}


# loop through all the CSV rows and insert 2 new columns for the percentile rank and scores
for ($i = 0; $i -lt $iops_len; $i++) {
$value = if ($i -lt $percentiles.Count) { $percentiles[$i] } else { $null }
$file[$i] | Add-Member -MemberType NoteProperty -Name "Percentile Rank" -Value $value

$value2 = if ($i -lt $percentiles.Count) { $new_iops[$i] } else { $null }
$file[$i] | Add-Member -MemberType NoteProperty -Name "IOPS %-tile Score" -Value $value2

}

# Show output to terminal
$file | Format-Table -AutoSize

# Export to a new CSV file
$file | Export-Csv -Path "$path\iops_percentiles.csv" -NoTypeInformation -Force

 

Co-Authors
Version history
Last update:
‎Apr 15 2021 10:11 AM
Updated by: