AKS-HCI , short for Azure Kubernetes Service on Azure Stack HCI is an on-premises Microsoft supported Kubernetes offering. AKS-HCI is built of and consistent with open-source Kubernetes offering. AKS-HCI simplifies on-premises Kubernetes deployment by offering a automated and standardized approach to deploying and managing Kubernetes clusters.
AKS-HCI provides consistency with Azure Kubernetes Service as much as possible in feature and operational details. This presents choices for deploying on-premises workloads and simplifies instrumentation of workload mobility between cloud and the edge.
AKS-HCI is designed from the get-go with security as one of its principal value propositions. An earlier blog written by my colleague, provides an overview of the security story for AKS-HCI.
As shown in the diagram above, the AKS-HCI security model follows least privilege principal. The all-powerful management cluster, the cluster used to create the workload clusters (also called resource clusters) is managed by handful of administrators and access to it is limited. Direct access to container host (think ability to SSH) is not allowed.
Each resource (workload) cluster deploys one or more virtual machine serving as container host for the workload cluster. The container host runs the control plane and the worker pods. Virtual machines provide strong kernel level isolation and contain the blast radius by preventing malware from escaping out to the host and affecting other workload clusters. Administrators also have the option to create separate physical clusters.
Finally, the containers themselves running within the virtual machine are process isolated with their respective resources and namespaces.
AKS-HCI Built with Strong Identity & Access Management Foundation
Starting February as part of the public preview, AKS-HCI will be supporting authentication and Single Sign On via AD (Active Directory) identity using kubectl
AD (Active Directory) Authentication provides several advantages over using certificate-based client authentication.
Kubernetes (a.k.a. K8S) uses configuration (referred as “kubeconfig”) stored on the client machine to connect to the api-server. This configuration contains authentication information to connect to the api-server. Any interaction with the K8 cluster happens via the api-server, one can think of having access to the api-server as having keys to the K8 kingdom. Hence access to the api-server that is built on strong identity and access management foundation is critical to securing your K8 deployment.
Kubernetes offers various options to connect to the api-server, of those the configuration used to connect to the api-server using AD identity is the most secure, this is primarily because AD kubeconfig (think of AD kubeconfig as a type of kubeconfig) does not hold any secret that can potentially be used to compromise access to the api-server.
By default, AKS-HCI uses “certificate” based kubeconfig to connect clients to the api-server. The certificate based kubeconfig contains authentication information such as private keys. If malware or attacker gets access to this configuration file, they will be able to get access to the api-server and that would be like getting keys to the kingdom. As mentioned, earlier, by contrast the AD kubeconfig does not hold any secret and merely having possession of AD kubeconfig does not grant access to the cluster. Eliminating the need to safely distribute kubeconfig improves security and efficiency, directly attributable to significant cost savings.
AD kubeconfig complements the “certificate” based kubeconfig, while certificate based kubeconfig is available to a select group of admins and used to connect to the cluster for initial provisioning (including setting up AD integration), the AD kubeconfig can be freely distributed without any security concerns to a wider group of users. An important distinction to note, unlike static configuration e.g. certificate based kubeconfig where users with the same configuration will always resolve to the same privileges, AD kubeconfig dynamically resolves privileges based on the user context it is applied.
Another benefit is the representation of identities in SID format, the human friendly group names in the role binding definition are stored in the SID format as K8 CRD. This provides protection against any human error in representing group names and naming conflicts or collisions as the group names need to resolve to corresponding SIDs in the domain server before access is granted. A related extension to this is the ability to represent AD groups in the RBAC role bindings, more on that in later part.
The windows server or the container host does not need to be domain joined for AD Authentication to work as long as the domain server and container host are time synchronized.
Let's now dive a bit into the trenches on AD integration works under the hood.
How it Works Under the Hood
The underlying implementation uses Kerberos protocol and requires Active Directory domain joined windows client. The client authenticates to the server (in our case K8 api-server) using Kerberos protocol. A few things need to be set up before the cluster can accept AD credentials for Authentication.
As shown in the diagram, an AD account for the api-server and corresponding SPN (service principal name) should be created on the AD domain server, the AD domain server also acts as the key distribution center. Next, a "keytab" corresponding to the the SPN needs to be generated.
Keytab contains symmetric encryption keys used to decrypt service tickets, the service ticket is presented to the api-server from the client machine.
These service tickets represent AD groups in SID format that is provided to the client upon successful authentication to the domain server. More details to follow on this flow.
A tool like ktpass (for windows machines) or ktutil (for linux machines) can be used to generate keytab. A client-side plugin is part of the installation to broker communication between kubectl and the api-server.
Three Fundamental Loops of Authentication Flow
At it's essence, the flow consists of three fundamental loops. . The “first loop” is the user acquiring the “service ticket” from the domain server contingent on successful authentication (we will get into this in a minute). This service ticket has user’s group membership in SID format.
The service ticket is generated for specific SPN (api-server in our case) and is provided to the user based on the user presenting what is known as TGT (ticket granting ticket). The user is able to get TGT based on successfully logging into the windows domain joined machine using their SSO credentials.
The “second loop” is the user presenting the “service ticket” to the api-server when she attempts to connect to the api-server via kubectl. This serves two purposes, to authenticate and authorize.
The “third” and the final loop is the api-server then taking the “service ticket”, unwrapping and unpacking the service ticket using keytab secret stored as K8 secret.
The api-server unpacks the ticket, extracts the group information, and validates against the RBAC (role-based access control) configuration (a.k.a role bindings in K8). In order for the user to execute command via kubectl both authentication and authorization steps need to complete successfully.
For the api-server to be able to unwrap the service ticket using the key stored in the keytab file, the user account and the api-server account doesn't need to share the same domain server, most organizations would keep those separate. If the account are on different domain servers they need to share the same forest and if they are on different forests they need to have trust relationship. For more details on how trust relationships work, refer to this link
Using AD Groups for Authorization
RBAC in K8 is defined in configuration known as “role bindings”. It is a two-step process where a role is defined and then the role is bound to user or group using role bindings. With AD integration users now have the ability to bind roles to AD groups.
When the service ticket is unpacked the group names are compared against the AD groups defined in role binding and access is granted based on the role binding definition.
Details on Few Anticipated Questions about this feature
Q: Do I need continuous connectivity of the container host to my domain server for AD Authentication to work
The container host does not need to have connectivity to the domain server, however, ensure the keytab is updated when the AD password of the api-server is updated.
Q: What is the expected behavior if the password on the AD account of the api-server expires
The service ticket granted to the api-server is cached for about 8-10 hours after which the keytab file (based on prior api-server password) would not be able to decrypt the service ticket and the authentication will fail.
Q: What are the next steps to enable AD Authentication if the api-server password expires
AD admin creates a new password and new keytab is generated. Un-install and re-install AD with the new keytab.
Q: Will AKS-HCI alert me, if api-server AD password is about to expire
AKS-HCI does not have direct line of sight to AD and cannot alert on expiring password.
Q: Can I renew my password before it expires
Yes, you can update the password, refer to the AD SSO set up and installation document for more details.
Ready to deploy refer the deployment guide for the setting up Active Directory enabled SSO
Stay Tuned for more
We are releasing AD integration for the resource / workload clusters, we will follow up integrating the management cluster in later releases including extending AD Authentication to Windows Admin Center (WAC). Stay tuned as we continue to bring new security features to AKS-HCI.