Microsoft offers prescriptive guidance called the Well-Architected Framework that optimizes workloads implemented and deployed on Azure. This guidance has been generalized for most workloads and creates a basis for reliable and secure applications that are cost optimized.
We have begun to build on this base content set to include more precise guidance for specific workload types, such as machine learning, data services and analytics, IoT, SAP, mission critical apps, and web apps. Machine Learning was the first branch from the base content, which came into fruition in the Fall of 2021.
Azure oriented prescriptive guidance needs to consider multiple dimensions of the workload type. Thus, these branches are developed by teams across Microsoft, including those that are customer-facing, partner-facing, product teams, and content teams.
Branches must meet several critical release criteria to become generally available, including:
As an example, teams across Microsoft led by the Customer Success Unit developed the Machine Learning branch to meet the specific needs of MLOps teams. This new space has all the same considerations as other workloads, but the technologies and processes used to create workloads leveraging machine learning capabilities differ dramatically.
The following generally opinioned statements in the Well-Architected Review become prescriptive in the AI/ML branch:
Branch |
General Statement |
Specific AI/ML Guidance |
Cost Optimization |
Managing costs to maximize the value delivered. |
Provisioning of CPU, GPU for classical and deep learning models., usage of compute clusters for training, termination policies to terminate poorly performing runs and saving on computational costs etc. |
Operational Excellence |
Operational processes that keep a system running in production. |
Build, design and orchestrate AML with MLOps principles, Monitoring performance of deployed models, Segregation of env, development using SDK, AutomatedML, AML Designer, etc. |
Performance |
The ability of a system to adapt to changes in load |
Run experiments in parallel, Data partition strategy, Autoscaling for scalability, AML parallelrunconfig for processing large amounts of data, etc. |
Reliability |
The ability of a system to recover from failures and continue to function |
Scalability, network capacity, Managing quotas , Dataset versioning , Workspace capacity limits , Logging ML runs, Private links, etc. |
Security |
Protecting applications and data from threats. |
E.g., RBAC for AML , authentication , Data encryption best practices , Identity management, use of VNETs ,responsible ML , differential privacy, model interpretability , homomorphic encryptions, etc |
By providing more precise guidance, MLOps teams have been much more effective in implementing recommendations that generate profound impact across their workloads. As a result, we have seen a three-fold increase in recommendations implemented by customers in the AML pilot than those who use the general assessment. From the standpoint of everyone, that’s a huge success!
As a result, we are moving the AML branch from Public Preview to General Availability in April 2022.
To learn more about the AML branch, there are several links and the video below:
Additional Assessment Branches for SAP, IoT, Mission-Critical and Data Services are coming soon. Keep a look out for subsequent blog posts for each!
See the Enablement Show video below!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.