Over the last couple of years, Azure customers have leaned towards Kubernetes for their on-premises needs. Kubernetes allows them to leverage cloud native technologies to innovate faster and take advantage of portability across the cloud and at the edge. We listened and launched Azure Arc enabled Kubernetes to integrate customers Kubernetes assets in Azure and centrally govern and manage Kubernetes clusters including Azure Kubernetes Service (AKS). We have now taken it one step further to leverage Kubernetes and enable training ML (Machine Learning) models using Azure Machine learning.
Run machine learning seamlessly across on-premises, multi-cloud and at the edge
Customers cannow run their ML training on any Kubernetes target cluster in the Azure cloud, GCP, AWS, edge devices and on prem through Azure Arc enabled Kubernetes. This allows customers to use excess capacity either in the cloud or on prem increasing operational efficiency. With a few clicks, they can enablethe Azure Machine Learningagent to run on any OSS Kubernetes cluster that Azure Arc supports. This,along with other key design patterns, ensures a seamless set up of the agent on any OSS Kubernetes cluster such as AKS, RedHat OpenShift, managed Kubernetes services from other cloud providers, etc. There are multiple benefits of this design including using core Kubernetes concepts to set up/ configure a cluster, running cloud native tools, such as, GitOps etc. Once the agent is successfully deployed, IT operators can either grant Data Scientists access to the entire cluster or a slice of the cluster, using native concepts such as namespaces, node selectors, taints / tolerations, etc. The configuration and lifecycle management of the cluster (setting up autoscaling, upgrading to newer Kubernetes versions) is transparent, flexible and the responsibility of the customers’ IT operations team.
Built using familiar Kubernetes and cloud native concepts
The core of the offering is an agent that extends the Kubernetes API. Once set up with a single command, the IT operator can view these Kubernetes objects (operators for TensorFlow, PyTorch, MPI, etc.) using familiar tools such as, kubectl.
Data Scientists can continue to use familiar tools to run training jobs
One of the core principles we adhered to was splitting the IT operator persona and the Data Scientist one with separate roles and responsibilities. Data scientists do not need to know anything about or learn Kubernetes. To them, it is yet another compute target that they can submit their training jobs to. They use familiar tools, such as, the Azure Machine Learning studio, Azure Machine Learning Python SDK (Software Development Kits)or OSS tools( Jupyter notebooks, TensorFlow, PyTorch, etc.) spending their time solving machine learning problems rather than worrying about infrastructure that they are running on.
Ensure consistency across workloads with unified operations, management, and security.
Kubernetescomes with its own sets of challenges around security, management and governance. The Azure Machine Learning team and the Azure Arc enabled Kubernetes team have worked together to ensure that not only is an IT operator able to centrally monitor and apply policies on your workloads on Arc infrastructure butalso ensure that the interaction with Azure Machine Learning service is secure and compliant. This along with the consistent experience across the cloud and on prem clusters no longer require you to lift and shift machine learning workloads but seamlessly operate them across both. You can choose to just run in the cloud to take advantage of the scale or just run-on excess on- premises capacity while leveraging the single pane of glass Azure Arc provides to manage all your on-premises infrastructure.
We welcome you to take advantage of the Arc enabled machine learning. Please sign up to access the preview here. We look forward to getting feedback from you so that we can continue to build a solution that meets your organizational needs.