In Azure, we believe that Kubernetes is becoming the new foundation for compute and that over time, most applications will run on it in some form. As a result, a core goal of the Azure Kubernetes Service has always been to minimize friction in adopting the platform. When we launched the first preview of AKS in October 2017, we became the first managed Kubernetes service to offer unlimited Kubernetes control planes free of charge, thus ensuring that there was no impediment to choosing Kubernetes over traditional IaaS deployments and enabling customers to tailor their environment to their organization’s needs, whether that meant a few large clusters, or a bunch of smaller ones.
One challenge with this approach has been an inability to offer customers a formal service level agreement (SLA). In a formal SLA, the service provider agrees to refund some portion of the cost of the service in the event that they do not meet published availability targets. In this case, the service being provided (the Kubernetes control plane) is free and thus could not be backed by a traditional SLA. While we operate AKS in the same way as other Azure services, with sophisticated monitoring and alerting, 24x7x365 on-call engineering, and multiple forms of redundancy, many customers still seek the peace of mind that comes from a formal SLA. Indeed, in some cases, it is required by their internal procurement standards.
We spent a lot of time considering how to account for these conflicting goals. In the end, we decided that the simplest thing to do was to continue offering the existing, free version of AKS, with all of its capabilities, while also introducing an optional “Uptime SLA” add-on, available on a per-cluster basis. The Uptime SLA we're announcing today guarantees Kubernetes API server availability of 99.9% for regional clusters, and 99.95% for clusters leveraging availability zones, all backed by a formal Azure SLA, at a cost of $0.10/hour per cluster. The existing free version of AKS will continue to offer a service level objective (SLO) of 99.5% for both regional and AZ clusters. We will also introduce an option to add the Uptime SLA to existing clusters in the coming months.
With Uptime SLA
API Server availability
You can find more information about the options and what they entail in our documentation.
We believe this approach allows for maximum flexibility. For mission critical production workloads, you can opt into higher availability and have it backed by a formal SLA, while still having the ability to run unlimited free clusters when you don’t need that level of guarantee. Note that as of today, the Uptime SLA will only be available on new clusters, but we will be introducing an option to convert existing clusters to it in the coming months.
Stay tuned for lots more AKS announcements to come in the weeks and months ahead, all of which will be available either with or without an SLA!