First published on MSDN on Jun 05, 2014
This is the third blog post in a series about Cluster Shared Volumes (CSV). In this post we will go over performance monitoring. We assume that reader is familiar with the previous blog posts. Blog post http://blogs.msdn.com/b/clustering/archive/2013/12/02/10473247.aspx explains CSV components and different CSV IO modes. Second blog post http://blogs.msdn.com/b/clustering/archive/2014/03/13/10507826.aspx explains tools that help you to understand why CSV volume uses one or another mode for IO.
Now let’s look at the various performance counters which you can leverage to monitor what is happening with a CSV volume.
These performance counters are not CSV specific. You can find “Physical Disk” performance counters on every node where disk is physically connected.
There are large number of good articles that describe how to use Physical Disk performance counters (for instance here http://blogs.technet.com/b/askcore/archive/2012/03/16/windows-performance-monitor-disk-counters-exp... ) so I am not going to spend much time explaining them. The most important consideration when looking at counters on a CSV is to keep in mind that values of the counters is not aggregated across nodes. For instance if one node tells you that “Avg. Disk Queue Length” is 10, and another tells you 5 then actual queue length on the disk is about 15.
CSV uses SMB to redirect traffic using File System Redirected IO or Block Level Redirected IO to the Coordinating node. Consequently SMB performance counters might be a valuable source of insight.
On the non-coordinating node you would want to use SMB Client Shares performance counters. Following blog post explains how to read these counters http://blogs.technet.com/b/josebda/archive/2012/11/19/windows-server-2012-file-server-tip-new-per-s... .
On the coordinating node you can use SMB Server Shares performance counters that work in the similar way, and would allow you to monitor all the traffic that comes to the Coordinating node on the given share.
To map the CSV volume to the hidden SMB share that CSV uses to redirect traffic you can run following command to find CSV volume ID:
CSV volume ID is also used as the SMB share name. As we’ve discussed in the previous post to get list of CSV hidden shares you can use Get-SmbShare. Starting from Windows Server 2012 R2 you also need to add -SmbInstance CSV parameter to that cmdlet to see CSV internal hidden shares. Here is an example:
All File System Redirected IO will be sent to the share named 6861be1f-bf50-4bdb-941d-0a2dd2a46711 on the Coordinating node. All the Block Level Redirected IO will be sent to the share CSV$ on the Coordinating node.
In case if you are using RDMA you can use SMB Direct Connection performance counters. For instance if you are wondering if RDMA is used you can simply look at these performance counters on client and on the server.
If you are using Scale Out File Server then SMB performance counters also will be helpful to monitor IO that comes from the clients to the SOFS and CSVFS.
CSVFS provides a large number of performance counters. Logically we can split these counters into 4 categories
This diagram demonstrates what is measured by the CSVFS performance counters on non-Coordinating node.
As we've discussed before on non-Coordinating node File System Redirected IO and Block Redirected IO goes to SMB client. On Coordinating node you will see a similar picture, except that File System Redirected IO will be sent directly to NTFS, and we would never use Block Level Redirected IO.
In this section we will step through each counter and go into detail on specific counters. If you find this section too tedious to read then do not worry. You can skip over it and go directly to the scenario. This chapter will work for you as a reference.
Now let’s go through the performance counters in each of these groups starting from the counters with prefix IO. I want to remind you again that this group of counters tells you only about IOs that are sent to the CSV Volume Manager, and does NOT tell you how long IO spent inside CSVFS. It only measures how long it took for CSV Volume Manager and all components below it to complete the IO. For a disk it is unusual to see that some IO go Direct IO while other go Block Redirected IO. CSV Volume Manager always prefers Direct IO, and uses Block Redirected IO only if disk is not connected or if disk completes IO with an error. Normally either all IO are sent using Direct IO or Block Redirected IO. If you see a mix that might mean something wrong with path to the disk from this node.
These two counters tell how many outstanding reads and writes we have on average per second. If we assume that all IOs are dispatched using Direct IO then value of this counter will be approximately equal to PhysicalDisk\Avg.Disk Write Queue Length and PhysicalDisk\Avg.Disk Read Queue Length accordingly. If IO is sent using Block Level Redirected IO then this counter will be reflecting SMB latency, which you can monitor using SMB Client Shares\Avg. Write Queue Length and SMB Client Shares\Avg. Read Queue Length.
These two counters tell how many outstanding reads and writes we have at the moment of sample. A reader might wonder why do we need average queue length as well as current queue length and when does he need to look at one versus the other. The only shortcoming with average counter is that it is updated on IO completion. Let’s assume you are using perfmon that by default samples performance counters every second. If you have an IO that is taking 1 minute then for 1 minute average queue length will be 0, and once IO completes for a second it will become 60, while queue length tells you the length of IO queue at the moment of the sample so it will be telling you 1 for all 60 samples during the minute this IO is in progress. On the other hand if IO are completing very fast (microseconds) then there is a high chance that at the moment of the sample IO queue length will be 0 because we just happened to sample at the time when there was no IOs. In that case average queue length would be much more meaningful. Reads and writes are usually completing in microseconds or milliseconds so in majority of cases you want to look at the average queue length.
Tells you how many read/write operations on average have completed in the past second. When all IO are sent using Direct IO then this value should be very close to PhysicalDisk\Disk Reads/sec and PhysicalDisk\Disk Writes/sec accordingly. If IO is sent using Block Level Redirected IO then counters value should be close to Smb Client Share\Write Requests/sec and Smb Client Share\ Read Requests/sec.
Tells you how many read/write operations have completed since the volume was mounted. Keep in mind that values of these counters will be reset every time file system dismounts and mounts again for instance when you offline/online corresponding cluster disk resource.
Tells you how many seconds read/write operations take on average. If you see a value 0.003 that means IO takes 3 milliseconds. When all IO are sent using Direct IO then this value should be very close to PhysicalDisk\Avg.Disk sec/Read and PhysicalDisk\Avg.Disk sec/Write accordingly. If IO are sent using Block Level Redirected IO then counters value should be close to Smb Client Share\Avg.sec/Write and Smb Client Share\Avg.sec/Read.
These counters are similar to the counters above, except that instead of telling you # of read and write operations (a.k.a IOPS) they are telling throughput measured in bytes.
These counters help you to monitor how often CSVFS needs to split IO to multiple IOs due to the disk fragmentation, when contiguous file offsets map to disjoined blocks on the disk. You can reduce fragmentation by running defrag. Please remember that to run defrag on CSVFS you need to put it to the File System Redirected mode so CSVFS would not disable block moves on NTFS. There is no straight answer to the question if a particular value of the counter is bad. Remember that this counter does not tell you how much fragmentation is on the disk. It only tells you how much fragmentation is being hit by ongoing IOs. For instance you might have all IOs going to couple locations on the disk that happened to be fragmented, and the rest of the volume is not fragmented. Should you worry then? It depends … if you are using SSDs then it might not matter, but if you are using HDDs then running defrag might improve throughput if it will make IO more sequential. Another common reason to run defrag is to consolidate free space so it can be trimmed. This is particularly important with SSDs or thinly provisioned disks. CSVFS IO Split performance counters would not help with monitoring free space fragmentation.
Last set of counters in that group tells you how many IO were dispatched without need to be split. It is a complement of the corresponding “IO Split” counters, and is not that interesting for performance monitoring.
Next we will go through the performance counters with prefix Redirected. I want to remind you that this group of counters tells you only about IOs that are sent to NTFS directly (on Coordinating node) or over SMB (from a non-Coordinating node), and does NOT tell you how long IO spent inside CSVFS, but only measures how long it took for SMB/NTFS and all components below it to complete the IO.
These two counters tell how many outstanding reads and writes do we have on average per second. This counter will be reflecting SMB latency, which you can monitor using SMB Client Shares\Avg. Write Queue Length and SMB Client Shares\Avg. Read Queue Length.
These two counters tell how many outstanding reads and writes we have at the moment of sample. Please read comments for the IO Write Queue Length and IO Read Queue Length counters if you are wondering when you should look at average queue length versus the current queue length.
Tells you how many milliseconds read/write operations take on average. Counters value should be close to Smb Client Share\Avg.sec/Write and Smb Client Share\Avg.sec/Read.
These counters will help you to monitor IOPS and throughput, and do not require much explaining. Note that when CSVFS sends IO using FS Redirected IO it will never split IO on a fragmented files because it forwards IO to NTFS, and NTFS will perform translation of file offsets to the volume offsets and will split IO into multiple if required due to file fragmentation.
Next we will go through the performance counters with prefix Volume. For all the counters in this group please keep in mind that values start fresh from 0 every time CSVFS mounts. For instance offlining and onlining back corresponding cluster physical disk resource will reset the counters.
And here comes the last set performance counters from CSVFS group. Counters in this group do not have a designated prefix. These counters are measuring IO at the time when it arrives to the CSVFS, and include all the time IO spent at any layers inside or below CSVFS.
These two counters tell how many outstanding reads and writes we have at the moment of sample.
These counters tell you how much time on average passes since IO has arrived to CSVFS before CSVFS completes this IO. It includes the time IO spends at any layer below CSVFS. Consequently it includes IO Write Latency, IO Read Latency, Redirected Write Latency and Redirected Read Latency depending on the type of IO and how IO was dispatched by the CSVFS.
These counters will help you monitor IOPS and throughput, and hopefully do not require much explaining.
These counters tell you how many flushes come to CSVFS on all the file objects that are opened
Tells how many files are currently opened on this CSV volume
CSVFS provides fault tolerance and attempts to hide various failures from application, but in some cases it might need to indicate that recovery was not successful. It does that by invalidating application’s file open and by failing all IOs issued on this open. These two counters allow you to see how many files opens were invalidated. Please note that invalidating open does not do anything bad to the file on the disk. It simply means that application will see IO failure and will need to reopen the file and reissue these IOs.
Allows you to monitor how many file opens are happening on the volume.
This is catch all for all other operations that are not covered by any counters above. This counters will be incremented when you query or set file information or issue an FSCTL on a file.
To better understand relationship between different groups of CSVFS performance counters lets go through lifetime of some hypothetical non-cached write operation.
Note the color highlighting in the sample above. It is emphasizing relationship of the performance counter groups. We can also describe this scenario using following diagram where you can see how time ranges are included
CSV volume pause is a very rear event and very few IOs run into it. For majority of IOs timeline would look in one of the following ways
In these cases Read Latency and Write Latency will be the same as IO Read Latency, IO Write Latency, Redirected Read Latency and Redirected Write Latency depending on the IO type and how it was dispatched.
In some cases file system might hold an IO. For instance if IO extends a file then it will be serialized with other extending IO on the same file. In that case timeline might looks like this
The important point here is that Read Latency and Write Latency are expected to be slightly larger than its IO* and Redirected* partner counters.
One thing that we cannot tell using CSVFS performance counters is if IO was sent using Direct IO or Block Level Redirected IO. This is because CSVFS does not have that information as it happens at a lower layer, only CSV Volume Manager knows that. You can get visibility to what is going on in the CSV volume manger using Cluster CSV Volume Manager counter set.
Performance counters in this group can be split into 2 categories. The first category has Redirected in its name and second does not. All the counters that do NOT have Redirected in the name are describing what is going on with Direct IO, and all the counters that do describe Block Level Redirected IO. Most of the counters are self-explanatory. Two require some explaining. If disk is connected then CSV Volume Manager always first attempts to send IO directly to the disk. If disk fails the IO the CSV Volume Manger will increment Direct IO Failure Redirection and Direct IO Failure Redirection/sec and will retry this IO using Block Level Redirected IO path over SMB. So these two counters help you to tell if IO is redirected because disk is not physically connected or because disk is failing IO for some reason.
In this section we will go over the common scenario/question that can be answered using performance counters.
Disclaimer: Please do not read much into the actual values of the counters in the samples below because samples were taken on test machines that are backed by extremely slow storage and on the machines with bunch of debugging features enabled. These samples are here to help you to understand relationship between counters.
Simply check IOPS and throughput using following CSV Volume Manager counters
Values should be approximately equal to the load that you are placing on the volume.
You can also verify that no unexpected redirected IO is happening by checking IOPs and throughput on the CSV File System Redirected IO path using CSVFS performance counters
and CSV Volume Redirected IO path
Above is an example of how Performance Monitor would look like on the Coordinator node when all IO is going using Direct IO. You can see that \Cluster CSV File System(*)\Redirected* counters are all 0 as well as \Cluster CSV Colume Manager(*)\* - Redirected performance counters are 0. This tells us that there is no File System Redirected IO or Block level Redirected IO.
You can check how much over all IO is going through a CSV volume using following CSVFS performance counters
Value of the counters above will be equal to the sum of IO going to the File System Redirected IO path
and IO going to the CSV Volume Manager, which volume manager might dispatch using Direct IO or Block Redirected IO
You can check how much IO is sent by the CSV volume manager directly to the disk connected on the cluster node using following performance counters
Value of these counters will be approximately equal to the values of the following performance counters of the corresponding physical disk
You can check how much IO is sent by the CSVFS directly to the disk connected on the cluster node using following performance counters
The picture above shows you what you would see on the coordinating node if you put Volume1 in the File System Redirected mode. You can see that only \Cluster CSV File System(*)\Redirected* are changing while \Cluster CSV File System(*)\IO* are all 0. Since File system redirected IO does not go through the CSV Volume manager its counters stay 0. File System Redirected IO go to NTFS, and NTFS sends this IO to the disk so you can see Physical Disk counters match to the CSV File System Redirected IO counters.
You can check how much IO CSV Volume Manager dispatches directly to the Coordinating node using SMB using following performance counters.
Please note that since on the Coordinating node disk is always present CSV will always use Direct IO, and will never use Block Redirected IO so on the Coordinating node values of these counters should stay 0.
To find out Direct IO latency you need to look at the counters
To understand where this latency is coming from you need to first look at the following CSV Volume Manager performance counters to see if IO is going Direct IO or Block Redirected IO
If IO goes to Direct IO then next compare CSVFS latency to the latency reported by the disk
If IO goes to Block Redirected Direct IO, and you are on Coordinator node then you still need to look at the Physical Disk performance counters. If you are on non-Coordinator node then compare look at the latency reported by SMB on the CSV$ share using following counters
Then compare SMB client latency to the Physical disk latency on the coordinator node.
Below you can see a sample where all IO goes Direct IO path and latency reported by the physical disk matches to the latency reported by the CSVFS, which means the disk is the only source of the latency and CSVFS does not add on top of it.
To find out File System Redirect IO latency you need to look at the counters
To find out where this latency is coming from on coordinator node compare it to the latency reported by the physical disk.
If you see latency reported by the physical disk is much lower to what reported by the physical disk then one of the components located between CSVFS and the disk is enquing/serializing the IO.
Above you can see an example where you can see physical disk reported latency
Given statistical errors ant that snapshotting values of different counters is not synchronized we can ignore 1 milliseconds that CSVFS adds and we can say that most of the latency comes from the physical disk.
If you are on non-Coordinating node then you need to look at SMB Client Share performance counters for the volume share
After that look at the latency reported by the physical disk on the Coordinator node to see how much latency is coming from SMB itself
In the sample above you can see
In the sample before we’ve seen that disk read latency is 23 milliseconds and write latency is 18 milliseconds so we can conclude that disk is the biggest source of latency.
To answer this question you need to look at the sum/average of the following performance counters across all cluster nodes that perform IO on this disk. Each node’s counters will tell you how much IO is done by this node, and you will need to do the math to find out the aggregate values.
You can play with different queue length changing load of the disk and checking against your target. There is really no right or wrong answer here and it all depends on what is your application expectations are.
In the sample above you can see that total IO queue length on the disk is about (8.951+8.538+10.336+10.5) 38.3, and average latency is about ((0.153+0.146+0.121+0.116)/4) 134 milliseconds.
Please note that physical disk number in this sample happens to be the same – 7 on both cluster nodes. You should not assume it will be the same. On the coordinator node you can find it using Cluster Administrator UI by looking at the Disk Number column.
Unfortunately there are no good tools to find it on the non-Coordinator node, the Disk Management mmc snap-in is the best tool.
To find physical disk number on all cluster nodes you can move the Cluster Disk from node to node writing down Disk Number on each node, but be careful especially when you have actual workload running because while moving volume CSV will pause all IOs, which will impact your workload throughput.
When you are looking at the cluster networks keep in mind that cluster splits networks into several categories and each type of traffic uses only some of the categories. You can read more about that in the following blog post http://blogs.msdn.com/b/clustering/archive/2011/06/17/10176338.aspx .
If you know the network bandwidth you can always do math to verify that it is large enough to be able to handle your load. But you should verify it in empirical way then before you put your cluster into production. I would suggest you use following steps:
If you see that you are getting the same IOPS and MBPS then your bottleneck is the disk, and network is fine.
In case if you are using Scale-out File Server (SOFS) you can use similar way to verify that network is not the bottleneck, but in this case instead of PhysicalDisk counters use Cluster CSV File System performance counters
When using RDMA it is a good idea to verify that RDMA is actually working by looking at the \SMB Direct Connection(*)\* family of counters.
We went over 4 counter sets that are most useful when investigating CSVFS performance. Using PowerShell cmdlets we are able to see if Direct IO is possible. Using performance counters we can verify if IO is indeed is going according to our expectations, and by looking at counters at different layers we can find where the bottleneck is.
Remember that none of the performance counters that we talked about is aggregated across multiple cluster nodes, each of them provides one node view.
If you want to automate collection of performance counters from multiple node consider using this simple script http://blogs.msdn.com/b/clustering/archive/2009/10/30/9915526.aspx that is just a convenience wrapper around logman.exe. It is described in the following blog post http://blogs.msdn.com/b/clustering/archive/2009/11/10/9919999.aspx .
Principal Software Development Engineer
Clustering & High-Availability
Cluster Shared Volume (CSV) Inside Out
Cluster Shared Volume Diagnostics
Cluster Shared Volume Performance Counters
Cluster Shared Volume Failure Handling
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.