AKS on Azure Stack HCI and Windows Server 2023-03-15 Update
Published Mar 15 2023 05:36 PM 6,225 Views
Microsoft

Hello everyone,

 

This a release everyone running AKS on Azure Stack HCI or Windows Server will appreciate - it is full of quality improvements for long-running deployments.  We are also delighted to share Windows Server 22 container support!

 

Before getting into the update details, we have a few Announcements:

  • We are deprecating K8s 1.22 in conjunction with the upstream Kubernetes community (k8s supported versions).
  • This release has two changes which could impact automation.
    • The way we assign an IP address and FQDN to the cloud agent service during first installation.
    • The default Kubernetes version for a target cluster is now the latest available version for a given release.

On to product improvements. 

 

Windows Server 2022 worker nodes are now Generally Available

We are pleased to announce Windows Server 2022 node pools are now available on AKS-HCI. This will be available with K8s v1.24 and above and using containerd as the runtime. Major improvements include:

To get started, create your first Windows Server 2022 node pool by following the instructions linked here.

 

We have also updated the Windows nodes base image to include the latest versions of several container images to reduce the number of images downloaded during install and upgrade.

 

Automatic rotation of AKS hybrid internal tokens

Identity tokens in AKS hybrid have a relatively short lifespan to keep AKS hybrid secure against bad actors, however, balancing frequent token rotation with effortless maintenance can be a struggle.  To this point, we've been managing this by rotating our internal tokens during upgrade and offering manual flows for repairing internal certificates.  Over time, we've discovered that isn't enough - I know many of you have hit issues related to expired internal certificates.

 

Starting with this release, we will automatically rotate all internal identity tokens so we can keep rotating tokens frequently while also providing peace of mind to customers who were previously impacted by the manual process of certificate rotation. 

 

GPU support for T4 GPUs has moved from public preview to general availability

GPU support for DDA attached GPUs is now generally available. The initial version of the feature supports Nvidia T4 Tensor class GPUs. To deploy a nodepool with GPU support simply specify one of the new sizes ( Standard_NK6 or Standard_NK12) and AKS hybrid does the rest provided the hardware is present of course.

 

The GA release also comes with a host of platform improvements for the following known issues from public preview:

  • Virtual Machine leak (the VM appearing in off state) when worker nodes are greater than available GPUs
  • Set-akshcinodepool/new-akshcicluster commands incorrectly report that PCI device is in use
  • Set-akshcinodepool command incorrectly reports that cluster does not have enough resources

 

Restrict SSH Access to VMs under AKS hybrid service to a limited set of IP addresses

We have added a feature that restricts Secure Shell Protocol (SSH) access to underlying VMs to certain IP addresses. By default, anyone with administrator access to AKS hybrid can access AKS hybrid service VMs through SSH on any machine.  Given access is already limited to administrators, limiting access by IP address doesn't change our security posture but can make compliance much easier for customers who need to meet strict access controls requirements.

 

Version updates and bug fixes

This release has an amazing number of important, experience improving, bug fixes.  So much so, I broke it into upgrade improvements, load balancer improvements, and then other product improvements.

 

Upgrade improvements:

  • CSI Node Driver is ignoring the check to see if the volume is already mounted, causing duplicate volumes to be created during upgrade. (issue #304)
  • Improved stability of upgrade during cluster restart scenarios (issue #295)
  • Certificate mismatch when registry decrypt fails

Load balancer bugs fixed in this release:

  • LoadBalancerIP does not work with servicetype=LoadBalancer - we have fixed this so now you can specify LoadBalancerIP in the service configuration and this will work. (issue #283)
  • When deploying multiple load balancers (HAProxy) they lose track of their peers, reducing performance. (issue #305)

Other improvements

  • kube-apiserver doesn't reload the kubelet client certificate/key on each logs query resulting in error “error: You must be logged in to the server (the server has asked for the client to provide credentials (pods/log kube-prometheus-stack-1664-operator-xxxxxxj)” (issue #285) - pausing here for a moment, this is an issue we found in upstream Kubernetes and were able to work with sig/auth to fix for all Kubernetes users <3.
  • Issue with time synchronization between Mariner 2.0 hosts and Hyper-V (issues #307)
  • Installing workload cluster on DHCP network sometimes fails with “dns servers is null or empty"

Software updates:

  • We updated to CBL-Mariner 2.0 February 2023 Update - microsoft/CBL-Mariner (github.com)
  • Updated several components and dependencies to the latest versions to fix CVEs (see release notes for details).

Documentation updates

New content:

Troubleshooting guide updates:

 

As always, you can try AKS on Azure Stack HCI or Windows Server any time even if you do not have the hardware handy using our eval guide to set up AKS on a Windows Server Azure VM.

 

Once you have downloaded and installed the AKS on Azure Stack HCI or Windows Server Update – you can report any issues you encounter, follow our plans, and check out recently released updates through the AKS hybrid roadmap in GitHub.

 

We look forward to hearing from you all!

 

Cheers,

Sarah

1 Comment
Co-Authors
Version history
Last update:
‎Mar 15 2023 05:36 PM
Updated by: