How to deploy a production-ready AKS cluster with Terraform verified module
Published May 02 2024 07:32 AM 2,353 Views
Microsoft

Do you want to use Terraform to deploy an Azure Kubernetes Service (AKS) cluster that meets the production standards? We have a solution for you! 

 

We recently created a Terraform verified module for AKS that allows customers to deploy a production standard AKS cluster along with a Virtual Network and Azure container registry. It provisions an environment sufficient for most production deployments for AKS. 

 

The module is available on the Terraform registry and can be found here. 

 

You don't have to deal with the complexity of setting up an AKS cluster from the ground up. The module offers opinionated choices and reasonable default settings to deploy an AKS cluster ready for production.

 

What are Azure Verified Modules?

 

Azure Verified Modules enable and accelerate consistent solution development and delivery of cloud-native or migrated applications and their supporting infrastructure by codifying Microsoft guidance (WAF), with best practice configurations. For more information, please visit Azure Verified Modules.

 

What does the module do?

 

The module provisions the following resources:

 

  • Azure Kubernetes Service (AKS) cluster for production workloads 
  • Virtual Network 
  • Azure Container Registry 

To view the full list of resources and their configurations, please visit the module page.

 

How to use the module

 

To use the module, you need to have Terraform installed on your machine. If you don't have Terraform installed, you can download it from their website here. 

 

Once you have Terraform installed, you can create a new Terraform configuration file and add the following code:

 

 

 

module "avm-ptn-aks-production"  { 
  source  = "Azure/avm-ptn-aks-production/azurerm" 
  version = "0.1.0" 
  location = <region> 
  name = <cluster-name>  
  resource_group_name = <rg-name> 
  rbac_aad_admin_group_object_ids = ["11111111-2222-3333-4444-555555555555"]  
}

 

 

 

To understand more about the variables and options available, have a look at the GitHub README.

 

Running the module will provision the resources in your Azure subscription. You can view the resources in the Azure portal.

 

How we built the module

 

Terraform AKS module.png

 

This module is very opinionated and forces the user into a design that is ready for production. From the experience of supporting users deploying AKS with Terraform with the module "Azure/aks/azurerm", we proposed a much simpler module to help customers deploy scalable and reliable clusters. 

 

Here some of the important opinionated choices we made.

 

Create user zonal node pools in all Availability Zones

 

When implementing availability zones with the cluster autoscaler, we recommend using a single node pool for each zone. The use of the "balance_similar_node_groups" parameter enables a balanced distribution of nodes across zones for your workloads during scale up operations. When this approach isn't implemented, scale down operations can disrupt the balance of nodes across zones. 

 

Leverage AKS automatic upgrades to keep the cluster secure and supported

 

AKS has a fast release calendar. It is important to keep the cluster on a supported version, and to get security patches quickly. We enforce the "patch" automatic channel upgrade and the node image "node_os_channel_upgrade" to keep the cluster up to date. It is a user's responsibility to plan Kubernetes minor version upgrades. 

 

Use Azure CNI Overlay for optimal and simple IP address space management

 

There are many options when it comes to AKS networking. In most customer scenarios, Azure CNI Overlay is the ideal solution. It is easy to plan IP address usage and it provides plenty of options to grow the cluster. 

 

Use Private Kubernetes API endpoint and Microsoft Entra authentication for enhanced security

 

We use a layered security approach to protect your Kubernetes API from being hacked. We keep the Kubernetes API safe by putting it in a private network, and we allow Microsoft Entra identities to authenticate (optional: and we turn off local accounts).

 

Bring your own network and force a User Assigned identity

 

Customers scenarios often involve more than one single AKS cluster. The Azure VNet where these clusters exist should be part of a resource group controlled by the customer. Reusing the same User Assigned identity across a fleet of clusters, simplifies the role assignment operations. We wrote this module considering the integration in a real-world customer subscription, rather than considering the AKS cluster as a single isolated entity. 

 

Don't use any preview features

 

To prevent breaking changes during production, we avoided the use of any preview features. 

 

Development of the module from a Terraform perspective

 

The Azure Verified Module team worked to create effective pipelines for module development. For initial development you will need to fork an already prepared template and use that to develop your module. The template is available on GitHub hereThis ensures that all module developers are following the same standards and best practices. It also makes it easier to review and approve modules for publication and make any updates to the templates. 

 

The pipeline has in built checks to ensure that the module is following the best practices and standards. It provides a Docker container with all the necessary tools to run the checks locally and as well on GitHub Actions. The pipeline runs the following checks: 

 

  • Checks linting standards that are set and best practices set by the AVM community.
  • Validates that the Terraform code is valid using "terraform validate". 
  • Run checks to update the readme if any changes are detected so that you don’t have to manually update them. 
  • The e2e tests only need you to give examples of the module's functionality and set up a test environment using GitHub to start them. You can see the steps on how to do this here.

For an end-to-end review of the contribution flow and how to setup your module for development using AVM scripts have a look at the Terraform Contribution Guide.

 

Lessons learned

 

  • The AVM team provides the initial module template and GitHub actions pipeline to develop a new module. Using those resources and attending their office hours meeting enabled us to move faster. When building a new Terraform module for Azure, following the procedure to implement an AVM module saves you a lot of time, ensuring quality and avoiding common mistakes.
  • It adds a lot of value to join the AVM team community calls or look out for the changes mentioned in AVM GitHub repo, to get updates on the latest changes, and to ask any questions you may have. 
  • When writing the design document, before starting development, make sure you address all edge cases. For example, not all Azure regions have availability zones, and the module must work in all Azure regions. Dealing with the details before starting the implementation helps to find good solutions without having to make bigger changes in the implementation phase.

 

How can you contribute back 

 

 

Conclusion 

 

If you face any challenges, please raise an issue on the repo - https://github.com/Azure/terraform-azurerm-avm-ptn-aks-production

 

We would also like to thank Zijie He and Jingwei Wang for their huge contributions and collaboration whilst building this module.

Version history
Last update:
‎May 02 2024 03:19 AM
Updated by: