Azure Stack HCI (hyperconverged infrastructure) is a solution that hosts virtualized Windows and Linux workloads and their storage in a hybrid environment that combines on-premises infrastructure with Azure cloud services. AKS Hybrid is an on-premise implementation of Azure Kubernetes Service (AKS) orchestrator which automates running containerised applications at scale and runs on Windows Server with Hyper-V, and the Stack HCI platform. Together they provide a solution for hosting highly available workloads on-premise. Azure Arc is a cloud based control plane which can be used for managing on-premise AKS Hybrid instances. Flux is an open sourced set of continuous delivery solutions for Kubernetes. AKS and AKS Hybrid natively support Flux through their GitOps capabilities. GitOps is a modern approach to managing and automating the deployment and operation of software applications and infrastructure using Git as the source of truth. Git is a distributed version control system that tracks changes in source code and facilitates collaboration among developers.
This is the second part of a two-part story, you can find part one here. This article and associated links walks you through the process of creating an AKS Hybrid PoC using a real-world use case.
Introduction
You may remember from part one that we worked with a customer to create a proposed solution that addressed their requirements to provide a resilient, flexible, next generation architecture to support their call centre triage service.
Once you have a proposed architecture, the next logical step for most customers is to test the hypothesis with a proof of concept (PoC). A PoC not only gives a customer the opportunity to test the validity of all or part of their idea before committing to fully implement, it also provides a deep learning exercise for the implementation and support teams, in what may be a technology they are unfamiliar with. Often customers ask the Microsoft team to work alongside them during PoC, which can accelerate the process, condensing weeks or months of work into as little as 3 days.
Successful PoCs are usually tightly scoped, with success criteria defined up-front. Proving an idea valid or invalid are both successful PoC outcomes.
Challenges and Requirements
Although we had worked with the customer’s team to create a hypothetical solution, there were still unknowns that would add risk to any decision to commit to the design. This is PoC territory.
We worked closely with the customer’s architects to define the areas of uncertainty – what parts of the architecture were they unsure of, felt needed greater understanding, answered important questions, and/or was needed to demonstrate in order to get buy-in from other stakeholders.
For the PoC we arrived at the following list of MUST-HAVEs:
And the following list of STRETCH capabilities:
Sounds like a lot for 3 days, right?
Proposed Solution
You can find the resources to follow along with the PoC implementation here.
What we Learned
The architecture worked as designed, supporting centrally located BAU operations which could be used to manage both cloud based, and on-premises workloads.
As a reminder, the stated high-level requirements for the solution were to support:
The PoC showed that standard Git repositories and CICD pipelines, with AKS GitOps, could be used to manage a gated API lifecycle effectively, and at the scale required (50+ clusters). It dramatically reduced the burden on suppliers, and gave the customer the ability to manage and maintain their solution with a high degree of control, insight, and rigor.
It was important for the customer to understand what requirements the solution would place on the suppliers who would host the on-premises AKS Hybrid deployment. By walking through the deployment process our PoC was able to provide clarity on this, and output a bulleted list of the hardware, software, and configuration that was required on the supplier side.
We were able to discuss security aspects of the solution, including the use of Defender to secure the containers, and create a list topics to look at in further depth at subsequent sessions like logging and troubleshooting.
It demonstrated that the solution would continue to function within expected parameters when cloud connectivity was dropped, and that through the use of resiliency and reliability patterns while core messaging would be interrupted, it would resume once connectivity was restored with no loss of data.
AKS Hybrid can be deployed to Stack HCI and Windows Server, with VMware currently in Private Preview, providing an excellent choice of deployment options.
While AKS Hybrid is an on-premise instance of AKS, some dependencies with the cloud remain – including for log shipment, monitoring, API management, and billing purposes. This was already known, and the normal mode of operation will be cloud connected. However, it was important to understand the limits of the product. These vary with the deployment platform – when deployed on Stack HCI the platform must be connected to Azure at least every 30 days to remain operational, see here. The limits are less distinct when running on other platforms such as Windows Server, but can eventually result in undesired drift in AKS. Fortunately support for 30 days offline was adequate in this use case.
In Conclusion
The PoC was successful in creating a demonstrable implementation of the specific customer use case that they could take forward and use as both a basis for ongoing experimentation and MVP, and as a demonstration environment. It allowed the implementation and support teams to upskill quickly and in a safe environment and encouraged enthusiasm and curiosity around the platform. Better still, it threw out questions which had not been considered before, which we were able to answer together, de-risking any future full-scale implementation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.