Increasing Transparency into Azure Active Directory's Resilience Model
Published Jun 06 2023 11:38 AM 9,933 Views
Microsoft

Over the last two years we’ve been sharing the progress on Azure AD’s resilience investments, such as our 99.99% Service Level Agreement (SLA), our core resilience principles and architecture, and our differentiated resilience features like our automatic backup authentication service, regionally isolated authentication, and continuous access evaluation. 

 

We appreciate your confidence in  Azure AD as a mission-critical dependency for your applications and service. We also have heard how important transparency is to you in understanding how the service is built for resilience, as well as deeply understanding the actual realized resilience state of your own applications and services. 
 

Today, we’re excited to announce two new ways that we’re enhancing our transparency into these resilience capabilities and furthering our resilience journey: 

 

You can now see the actual SLA performance for your own tenant, in addition to the global SLA attainment for all tenants. 
 
Weve already been publishing the global SLA for Azure AD and details of how this is computed here. This data set goes back to February 2021 and shows we’ve exceeded 4-nines' SLA for over 16 months running (as of June 2023). 
 
Today, we’re pleased to extend this with a per-tenant SLA report that gives insight into the actual availability attained by your tenant. This is available as a public preview in both the Entra and Azure portals to all tenants with at least 5,000 monthly active users signing in, allowing administrators to see how their tenant is performing against the Azure AD SLA promise of 99.99% availability in authenticating users and issuing tokens.  
 
Well continue to extend this transparency and would love to hear your feedback on this new capability as well as what would be most useful to you. 
 
Here’s a screenshot of the per-tenant SLA view you can see in the Azure portal:

 

JMQuade_0-1686068332782.png

 

 

Improvements to the backup auth system
We first introduced the Backup Auth System in 2021.This system is designed to provide automatic and seamless protection for authentications to applications that integrate with Azure AD. These protections are part of a multi-layered resilience strategy and designed to kick in if all the other layers of resilience built into Azure AD fail. This is similar to how a UPS or generator may be connected to a critical facility to provide a backup to the power grid.  
 

Transparency into supported applications and authentication patterns 

Backup coverage is provided seamlessly to tens of thousands of applications based on protocol and scenario patterns.   

 

Beyond Microsoft applications, many applications in the Azure AD app gallery, as well as the largest iOS and Android email clients, and ~30 thousand business applications receive incremental resilience from the backup authentication system. The availability of support is determined by the authentication protocols and patterns used by these applications. Some of the largest applications include: 

 

  • ADP
  • Atlassian Cloud
  • AWS Single-Sign-on
  • GoToMeeting
  • Kronos
  • Marketo
  • Palo Alto Networks
  • SAP Cloud Identity
  • Trello
  • WorkdayZscaler, when configured using supported protocol patterns such as “SAML-IdP initiated.”  

 

We’re pleased to announce that today we’re launching a detailed documentation page we will regularly update that outlines the specific level of coverage the backup auth system provides with your applications. For more information about our specific levels of coverage, see “Azure AD’s Backup authenticaton system”.   

We’ll continue to update this as we regularly raise coverage levels and would love your feedback on places you’d like to see us prioritize. 

 

Supported user scenarios 

During an outage, users will remain productive on their supported applications when: the user has authenticated with the same app and device within the last three days; the user is authenticating as a member of their home tenant and not a B2B user; resilience defaults for that user authentication are enabled; and the user’s authentication has not been recently revoked or restricted. 

 

Our experience to date with the Backup Auth System 

The system has successfully mitigated instances where the primary service performance degraded and is continually serving authentication requests. A steady, randomly selected, small slice of requests from all organizations is redirected to the Backup Authentication System, resulting in millions of authentications successfully completed without impact to production services. Administrators can see examples of this by analyzing their Azure AD sign-in logs looking for records where the token issuer type is set to “Azure AD Backup Auth.” 

 

What’s next 

We’ll continually raise the bar on resilience, with enhancements across the authentication stack, and continue this series with regular updates and more information to help our customers better understand and take advantage of our investments.

 

Focusing on the Backup Authentication System, we are targeting the following improvements for the next 12 to 18 months:

 

  • Coverage improvement for certain Microsoft applications using the Android OS.
  • Protection for web applications that use the SAML protocol configured as SP Initiated SSO – which will protect many more app galleries and line of business applications.
  • Backup system protection for non-Microsoft applications requesting OpenID Connect access tokens. 

 

Thank you. 

Nadim Abdo 

CVP, Engineering 

 

 

Learn more about Microsoft identity:

 

Co-Authors
Version history
Last update:
‎Jun 07 2023 04:31 AM
Updated by: