Hi, I am Hilton Lange, SDE on the Virtual Machine Manager team. Today I want to share with you this blog on how cluster reserve calculations were entirely rewritten in VMM 2012. This rewrite has dramatically reduced the overly conservative results seen in the slot-based approach in VMM 2008, R2, and R2 SP1. Here is a broad summary:
Cluster reserve value of 1.
Consider each host for potential failure:
· Call the largest VM on that host “VLargest” and record its memory “LargestM”.
· Add up the memory of the
running HA VMs on the host. Call that amount “OtherM”
· Now consider each
host in the cluster, keep a running total “TotalExtraCapacity”
o Calculate how much extra capacity that host has until “VLargest” can no longer be placed there:
o Extra capacity = Host total memory – Host memory reserve – VM used memory – Vlargest
o If this extra capacity is non-negative, add this amount to “TotalExtraCapacity”
· If “OtherM” is greater than “TotalExtraCapacity”, the cluster will be shown as overcommitted.
Repeat this test for each host. If none of the hosts fail the test on the final step, your cluster will be shown as healthy!
What about reserve value (R) > 1?
The same algorithm as above applies, except that you need to consider each
of R hosts that could simultaneously fail. You consider each set, look at the largest VM on the set, and add up all the other VMs on the set as “OtherM”. Obviously the algorithm becomes too cumbersome to check by hand as soon as R exceeds 1 on a reasonably large cluster. Our algorithm will continue to work for all reasonable cluster reserve values, and falls back to a suitable approximation in the unlikely scenario that you have a higher reserve value.
How do we deal with stopped VMs?
Stopped VMs are considered as running for this algorithm. The rationale is that SSUs may start up their VMs, and we don’t want them to unintentionally overcommit the cluster through that action, bypassing the normal placement checks.
How are VMs with dynamic memory handled?
This is a thorny issue. Consider the following worst case: Because of some external load factors, all hosted VMs suddenly expand to their maximum configured memory. Then you experience node failure. Because overcommit is designed to provide a guarantee that HA VMs will be able to failover and start, the only way to give this guarantee is to consider each dynamic memory VM as consuming its maximum size.
However, this entirely negates the additional consolidation and flexibility that dynamic memory provides. A hoster or fabric owner may as well simply allocate all dynamic memory VMs as static with their maximum memory.
Because of this, dynamic memory VMs are considered as their
memory size. This means that cluster overcommit gives you information and a guarantee about what would happen if you experienced node failure
, but the overcommit status may change if a group of dynamic memory VMs grow in size.
How much does this improve over the slot-based method?
1 million random cluster configurations close to being overcommitted were generated. 130184 of those configurations were actually overcommitted. This is the accuracy rate of the old vs new method.
False positive rate
Real-world data might differ from the random cluster configurations, but we’re expecting to see a
decrease in clusters marked as overcommitted when they’re actually not.