Azure Kubernetes Service (AKS) is a managed service, meaning that Microsoft manages the control plane components (i.e., API Server, Controller Manager, etcd DB & Scheduler) for you and offloads significant operational overhead from you as an administrator.
There are also agent nodes that live inside the cluster, which run your application workloads and are customer managed. Those are going to basically get placed into your own secure Virtual Network (VNET) in Azure and they're going to be exposed with private IP addresses.
While AKS is a managed service, it is also worth mentioning that it is not a Platform as a Service (PaaS) component, like for example an Azure App Service. PaaS services are generally fully managed by Microsoft, and you don't have to log in or RDP into a Virtual Machine (VM) to perform any kind of operation.
Because AKS is a managed service, that means the control plane of the cluster, on the left-hand side of the above image, is managed by Microsoft. On the right-hand side the agent nodes where you will be deploying your workloads are managed by you as a customer. You are responsible for things like patching and rebooting the VMs. Besides that, you are responsible for upgrading Kubernetes, whenever a new version of it is available. Finally, you are also responsible for deploying your applications, making sure they are highly available and scalable. There is a sharing of responsibility between Microsoft and you as a customer.
Many of the best practices for AKS, presented in this post series, are included in what is called the baseline architecture for AKS. This is a great way to go ahead and get started, because it is a reference implementation of a cluster using all those best practices already pre-contained.
One of the things that's easy to do in Kubernetes is to deploy a pod, set up a “LoadBalancer
” type service, create a tag to attach it to the right pod(s) and expose that out to the internet. With that, you're going to tell Azure to put a public-facing load balancer in front of your service and you're going to get a public IP address to access it. This is not recommended. You don't want to use public facing load balancers at all.
To address that from a governance perspective, you can apply one of the most powerful Azure Policies, which is to force your cluster or any cluster you're creating within your Azure Subscription or management group to only allow the usage of internal load balancers . Using this, you can no longer create an externally facing load balancer service and attach that to a pod. You'll instead have to use an ingress controller, which is a proper way of having that Layer 7 ingress isolation in front of your service and be able to control what's coming in and what's going out of your cluster.
You might also want to have a Web Application Firewall (WAF) like Microsoft provides with Azure Front Door or with Azure Application Gateway, to control the ingress traffic going into your cluster.
Generally, in an AKS cluster you can run two types of workloads:
Let’s focus on the 1st scenario and see how that would look like in a standard recommended way for the ingress path. Originally, you would have something like a VNET in Azure, which would be the AKS network where the cluster would be deployed (usually a Spoke VNET, if following a Hub and Spoke network topology).
For public users, ideally, you would have deployed inside the AKS VNET an Application Gateway resource inside its dedicated subnet. This will expose a public IP and you would have your public users connecting to that exposed public IP over the internet. You would also integrate Web Application Firewall (WAF) capabilities in your Application Gateway resource.
Then, also inside the VNET you would have the regular networking infrastructure for AKS, which consists of:
This use case is for a regional deployment of the AKS cluster, because every Azure service used is deployed within a particular Azure region. This means that every user that tries to connect to that AKS cluster from the same region will have a much better User Experience (UX), than a user that would try to reach the AKS cluster from an entirely different region, especially when located in a faraway location.
Regarding the Public IP that the Application Gateway is exposing and in terms of networking, in Azure you get by default a “Basic Distributed Denial of Service (DDOS)” protection. This is the case for every public IP that you expose in Azure.
If you want, you can get another SKU of the DDOS protection in Azure, which is called the “Standard DDOS”. The difference between the two SKUs is that the Basic one is not tailored to the specific requirements of your workload, so Microsoft is not actually monitoring your workloads. What it does is, at some point, based on the usage patterns of your public IPs, it may decide that a DDOS attack is going on at the data center / regional level and try to apply some counter measures to prevent it.
On the other hand, if that does not fully suit your needs, and you need specific triggering for the counter measures of a potential DDOS attack, based on your workload requirements, then you should consider using the “Standard DDOS” SKU. In this SKU, Microsoft measures the usage patterns of your public IPs and knows over time what to expect from them. If there is a usage pattern in a particular public IP that deviates from what Microsoft expected from it, then it may decide to run counter measures for this potential DDOS attack.
In “Standard DDOS”, Microsoft also covers any costs that were caused by the appliance of autoscaling features in the event of a DDOS attack. For example, if you are experiencing a DDOS attack and start receiving a lot of requests to the public IP exposed by the Application Gateway and you have configured autoscaling for the Application Gateway component, then all the costs of the extra replicas deployed, caused by the DDOS attack, will be covered by Microsoft.
Keep in mind that the Standard DDOS SKU is something that is usually applied at the organizational level and not just on an individual workload, because it is an expensive SKU. With this you can protect over 100 of your public IPs.
A typical ingress path for a request coming from external users into your Kubernetes cluster, when deployed in the above way would be:
Note: There are many network hops with this approach and a possible overlap of functionality between Application Gateway and Ingress Controller (chained Layer 7 Load Balancers). If that is a problem for you, consider using Application Gateway Ingress Controller (AGIC) instead (details for this below). WAF capabilities can also be applied to the Ingress Controller.
The second interesting flow is if you have internal corporate users that you want to support connecting to the AKS cluster.
In this case, you could have something like the following scenario:
To summarize, typical ingress paths in AKS look like one of the following:
MC_{resourceGroupName}_{aksClusterName}_{aksClusterLocation}
resource group and is managed by Azure.
Regarding the egress traffic, which is the traffic that originates from within the AKS cluster and goes outside the cluster, it is recommended to also have some sort of egress traffic control in place for your cluster.
By default, when you create an AKS cluster, you will get a Public IP associated with the Standard Load Balancer component (SLB). This IP is being used for outbound traffic. If some pod wants to send a request from inside the cluster out to the Internet, then that will require SNAT-ting that connectivity, because for the outbound traffic to reach outside the cluster, a private and not a public IP is required. This, by default, would be the public IP associated with the SLB component. This is something that is not recommended, as you are not able to have any sort of rules to inspect what IPs are you allowed to egress to.
Normally, if you follow the hub and spoke network topology model, you should have an Azure Firewall, or any other NVA device of your choice, defined inside the Hub network. Then you should have a custom User Defined Route (UDR) table configured in the AKS subnet, which will instruct all outgoing network traffic that originates from the AKS subnet to flow first through the Firewall (NVA) IP and then go out to the Internet. This UDR would contain the following rule:
0.0.0.0/0
: This essentially means for all egress traffic.
If you are forwarding all outbound traffic through the Firewall device in the Hub network, meaning you are essentially SNAT-ting all connectivity to the Public IP of the Firewall device, then you obviously do not need the public IP associated with the SLB, so it is recommended to disable that public IP when you create the AKS cluster.
To do this using the az aks create
command, you must pass to the parameter --outbound-type
property the value userDefinedRouting
. The default is loadBalancer
, which causes the creation of the public IP, the association of it with the SLB component and the routing of all egress traffic to the internet through this public IP.
If you choose the recommended userDefinedRouting
value, then this will also validate that there is a UDR created and attached to the AKS VNET, and it will not create a public IP associated with the SLB. This means that the subnet that your worker nodes are going to run in, is going to have a UDR on the subnet that pushes the traffic over to the firewall device. If there is no outbound route set there on the subnet, when you try to create the cluster, it will fail. It will also give you a nice warning because you stated with the use of --outbound-type userDefinedRouting
, that a UDR must exist there, for the cluster to be created. Now you'll be able to fully control egress out of the cluster properly through your firewall device.
An important thing to note here is that when you use the Firewall for all egress traffic from your cluster, you need also to make sure that you have the right number of IPs associated with the Firewall, to have enough ephemeral ports for SNAT-ting outbound connections. Microsoft is currently recommending 20 public IPs attached to the firewall to limit SNAT port exhaustion issues . This depends a lot on the scenario and how many workloads are using the Firewall.
For this architecture to work properly, there are some traffic rules that needs to be enabled at the Firewall device . Microsoft has a list of all the required FQDNs that need to be enabled at the Firewall. All those sites are owned by Microsoft. Some of them are common mandatory ones that need to be enabled either way for all clusters. Microsoft also has others that depend on which AKS features you decide to enable in your cluster (e.g., use of Azure Monitor for Containers for monitoring purposes, use of Microsoft Defender for Containers, etc.). Always make sure that you review all the required FQDNs that need to be whitelisted for the components that you decide to enable in your AKS cluster.
To summarize, typical egress paths in AKS look like one of the following:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.