Virtual machines are arguably still one of the most fundamental core infrastructure components when it comes to cloud computing. Whether you are hosting databases, custom apps, runner jobs, or leveraging them as nodes for your container hosts, VMs are core to your arsenal of options. At the same time, given they are designed to host operating systems, and come in all shapes and sizes, they are one of the core areas of Azure compute you should aim to optimize as quickly as possible as part of your FinOps: Cloud Efficiency initiatives.
Per the FinOps Foundation, “In context to FinOps, resource utilization is about ensuring there is sufficient business value for the cloud costs associated with each class or type of resource being consumed.”
As you consider FinOps initiatives, you will very quickly realize that Cloud Cost Optimization is a core workflow, that in effect, never ends and requires continuous review by Application & Operations teams to ensure they are being good stewards of the environment.
Let’s consider this in the context of virtual machines, and some of the ways you can reduce cost, and in turn have a positive environmental impact at the same time.
Clean up your VMs and Disks
No surprises here, but VM sprawl was a problem on-premises, and it’s still a problem in the cloud. One of the biggest challenges I hear from customers here, is that they simply can’t be sure who owns VM "XYZ." The best way to save money on Azure compute, is to simply not need it in the first place.
e.g., Business Unit Key with a set of “Allowed” values so you can be sure future VMs can be mapped back to departments and ultimately people who can give you a yes/no on if it is needed.
Ensure Azure policies are applied to enforce this tagging policy during deployment.
Anything without an owner, take a full VM backup, and shut it down (in a change window of course).
If it can be safely removed, delete the VM and corresponding disks.
The remaining VMs have to stay, now what?
Great, we know that the virtual machines we have are now core to the business. The next step is to look at ways to optimize the overall VM footprint and review each workload individually to explore alternative architectures.
Reserved Instances (RIs): If we know we need this exact size instance, and it needs to run continuously, choose this option.
Savings Plans: After RIs, Savings Plans provide more flexibility across the remainer of Azure compute services. Look at adding these to cover the next level needed. At this point between RIs and Savings Plans, you will want to be at around 90% coverage.
PAYG: For any workloads that can be shut down periodically and take advantage of horizontal scaling, leave them on a PAYG SKU. The Savings Plan will provide benefit even for those dynamic workloads and ultimately this becomes your remainder of spend that is slightly less predictable but ensures you aren’t over-committing to spend you don’t need by going too high on the savings plan.
We know the business needs the workload, but do they need a “VM”. Many alternatives exist and often the business may not realize how easily some existing applications can be converted to run on PaaS at a better price with lower operational overhead.
Optimize the VM Instance Type
Last but not least, we need to tune those individual VMs at the micro level. We should inspect each VM and make sure it is optimized and tuned appropriately for the workload. As you are likely aware, there are many Azure Compute sizes to choose from (VM sizes - Azure Virtual Machines | Microsoft Learn.) How do you go about navigating these options? Here is a simple cheat sheet to help you work through it.
Often people default to the general-purpose options, but it helps to think about these as to where you go if there isn’t a more specialized option that fits the purpose you are going for.
D Series – These are your general series, and match up largely to how you would think about on-premises VMs where you would choose your CPU, Memory etc.
A Series – Similar to the D series, but more entry level and often used for Dev/Test
B Series – Ideal only for workloads that don’t need the full performance of the CPU always. When not consuming heavy amounts of CPU they are storing credits which can be used when they burst up. Useful for web servers but ensure you understand their consumption patterns before moving to Production. Running out of CPU credits can significantly degrade performance.
E Series – After VMs are deployed in a D series configuration, it is very common to find they could be converted to E series. This is because a number of workloads, and COTS applications can take advantage of more memory than CPU. Switching to E series can significantly save money here while tuning the VM to a memory optimized instance.
Ls Series - Offer high disk throughput and IO, ideal for large transactional databases and data warehousing. If you are running a MongoDB instance, for example, then a Ls series VM would be appropriate.
HB Series - Optimized for memory intensive applications like weather modeling.
HC Series - Optimized for dense computation applications like computational chemistry.
HX Series - Optimized for applications requiring larger memory capacity than the HB series.
By understanding your workloads and business owners you can optimize your costs in the cloud. Shut down idle virtual machines and use tags to understand who owns the workload for accountability. Check out pricing for Reserved Instances for your consistent workloads, Azure savings plans for the more dynamic workloads or use PAYG. Lastly choose the correct compute size for your workload; it is not always a one size fits all for VM size. For more information on optimizing your costs in Azure, read more here.
I would love to hear any feedback you have, please comment with your thoughts.