Microsoft Entra resilience update: Workload identity authentication
Published Mar 29 2024 09:00 AM 10.2K Views
Microsoft

Microsoft Entra is not only the identity system for users; it’s also the identity and access management (IAM) system for Azure-based services, all internal infrastructure services at Microsoft, and our customers’ workload identities. This is why our 99.99% service-level promise extends to workload identity authentication, and why we continue to improve our service’s resilience through a multilayered approach that includes the backup authentication system. 

 

In 2021, we introduced the backup authentication system, as an industry-first innovation that automatically and transparently handles authentications for supported workloads when the primary Microsoft Entra ID service is degraded or unavailable. Through 2022 and 2023, we continued to expand the coverage of the backup service across clouds and application types. 

 

Today, we’ll build on our resilience blogpost series by going further in sharing how workload identities gain resilience from the regionally isolated authentication endpoints as well as from the backup authentication system. We’ll explore two complementary methods that best fit our regional-global infrastructure. One example of workload identity authentication is when an Azure virtual machine (VM) authenticates its identity to Azure Storage. Another example is when one of our customers’ workloads authenticates to application programming interfaces (APIs).  

 

Regionally isolated authentication endpoints 

 

Regionally isolated authentication endpoints provide region-isolated authentication services to an Azure region. All frequently used identities will authenticate successfully without dependencies on other Azure regions. Essentially, they are the primary endpoints for Azure infrastructure services as well as the primary endpoints for managed identities in Azure (Managed identities for Azure resources - Microsoft Entra ID | Microsoft Learn). Managed identities help prevent out-of-region failures by consolidating service dependencies, and improving resilience by handling certificate expiry, rotation, and trust.  

 

This layer of protection and isolation does not need any configuration changes from Azure customers. Key Azure infrastructure services have already adopted it, and it’s integrated with the managed identities service to protect the customer workloads that depend on it. 

 

How regionally isolated authentication endpoints work 

 

Each Azure region is assigned a unique endpoint for workload identity authentication. The region is served by a regionally collocated, special instance of Microsoft Entra ID. The regional instance relies on caching metadata (for example, directory data that is needed to issue tokens locally) to respond efficiently and resiliently to the workload identity’s authentication requests. This lightweight design reduces dependencies on other services and improves resilience by allowing the entire authentication to be completed within a single region. Data in the local cache is proactively refreshed. 

 

The regional service depends on Microsoft Entra ID's global service to update and refill caches when it lacks the data it needs (a cache miss) or when it detects a change in the security posture for a supported service. If the regional service experiences an outage, requests are served seamlessly by Microsoft Entra ID’s global service, making the regional service interruption invisible to the customers.  

 

Performant, resilient, and widely available 

 

The service has proven itself since 2020 and now serves six billion requests per day across the globe.  The regional endpoints, working with global services, exceed 99.99% SLA. The resilience of Azure infrastructure is further protected by workload-side caches kept by Azure client SDKs. Together, the regional and global services have managed to make most service degradations undetectable by dependent infrastructure services. Post-incident recovery is handled automatically. Regional isolation is supported by public and all Sovereign Clouds. 

 

Infrastructure authentication requests are processed by the same Azure datacenter that hosts the workloads along with their co-located dependencies. This means that endpoints that are isolated to a region also benefit from performance advantages. 

 

sdriggers_0-1711389056623.png

 

Backup authentication system to cover workload identities for infrastructure authentication 

 

For workload identity authentication that does not depend on managed identities, we’ll rely on the backup authentication system to add fault-tolerant resilience.  In our blogpost from November 2021, we explained the approach for user authentication which has been generally available for some time. The system operates in the Microsoft cloud but on separate and decorrelated systems and network paths from the primary Microsoft Entra ID system. This means that it can continue to operate in case of service, network, or capacity issues across many Microsoft Entra ID and dependent Azure services. We are now applying that successful approach to workload identities. 

 

Backup coverage of workload identities is currently rolling out systematically across Microsoft, starting with Microsoft 365’s largest internal infrastructure services in the first half of 2024. Microsoft Entra ID customer workload identities’ coverage will follow in the second half of 2025. 

 

BAS.png

 

Protecting your own workloads 

 

The benefits of both regionally isolated endpoints and the backup authentication system are natively built into our platform. To further optimize the benefits of current and future investments in resilience and security, we encourage developers to use the Microsoft Authentication Library (MSAL) and leverage managed identities whenever possible. 

 

What’s next? 

 

We want to assure our customers that our 99.99% uptime guarantee remains in place, along with our ongoing efforts to expand our backup coverage system and increase our automatic backup coverage to include all infrastructure authentication—even for third-party developers—in the next year. We’ll make sure to keep you updated on our progress, including planned improvements to our system capacity, performance, and coverage across all clouds.  

 

Thank you, 

Nadim Abdo  

CVP, Microsoft Identity Engineering  

 

 

Learn more about Microsoft Entra: 

Co-Authors
Version history
Last update:
‎Apr 03 2024 12:35 PM
Updated by: