<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>Reliability and Resiliency in Azure articles</title>
    <link>https://techcommunity.microsoft.com/t5/reliability-and-resiliency-in/bg-p/reliability-and-resiliency-in-azure</link>
    <description>Reliability and Resiliency in Azure articles</description>
    <pubDate>Sat, 13 Jun 2026 08:33:25 GMT</pubDate>
    <dc:creator>reliability-and-resiliency-in-azure</dc:creator>
    <dc:date>2026-06-13T08:33:25Z</dc:date>
    <item>
      <title>Protect Azure Cosmos DB with vaulted backups using Azure Backup (public preview)</title>
      <link>https://techcommunity.microsoft.com/t5/reliability-and-resiliency-in/protect-azure-cosmos-db-with-vaulted-backups-using-azure-backup/ba-p/4522714</link>
      <description>&lt;P&gt;As organizations increasingly rely on &lt;STRONG&gt;Azure Cosmos DB&lt;/STRONG&gt; to power mission‑critical, globally distributed applications, protecting this data from &lt;STRONG&gt;accidental deletion, malicious activity, and ransomware&lt;/STRONG&gt; has become more important than ever.&lt;/P&gt;
&lt;P&gt;At MS Build 2026, we’re excited to announce the &lt;STRONG&gt;preview of Azure Backup for Cosmos DB&lt;/STRONG&gt;, which introduces &lt;STRONG&gt;vaulted backups&lt;/STRONG&gt;—a secure, isolated, and fully managed backup solution designed to strengthen cyber‑resilience and support compliance requirements.&lt;/P&gt;
&lt;H2&gt;Why vaulted backups for Azure Cosmos DB?&lt;/H2&gt;
&lt;P&gt;Azure Cosmos DB already provides built‑in data protection capabilities such as replication and availability features to help ensure application uptime. However, these capabilities alone may not be sufficient to protect against scenarios such as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Accidental or malicious deletion of data or accounts&lt;/LI&gt;
&lt;LI&gt;Compromised credentials or insider threats&lt;/LI&gt;
&lt;LI&gt;Ransomware attacks targeting production environments&lt;/LI&gt;
&lt;LI&gt;Compliance requirements that mandate off‑site, immutable backups&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Vaulted backups add an independent protection layer&lt;/STRONG&gt; by storing backup copies in an &lt;STRONG&gt;Azure Backup vault&lt;/STRONG&gt;, isolated from the source Cosmos DB account and managed through Azure Backup.&lt;/P&gt;
&lt;H2&gt;How vaulted backups protect your Cosmos DB data&lt;/H2&gt;
&lt;P&gt;With this preview, Azure Backup enables you to protect Azure Cosmos DB using a&amp;nbsp;&lt;STRONG&gt;policy‑driven, automated backup experience&lt;/STRONG&gt;. Once configured, Azure Backup manages backup scheduling, retention, and lifecycle without manual intervention.&lt;/P&gt;
&lt;P&gt;Key protection capabilities include:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Isolation from production data:&lt;/STRONG&gt; Vaulted backups are stored in a&lt;STRONG&gt; &lt;/STRONG&gt;separate, Microsoft‑managed backup vault, ensuring that backup data remains protected even if the source Cosmos DB account is deleted or compromised.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Resilience against ransomware and malicious attacks:&lt;/STRONG&gt; Because backups are isolated and protected by Azure Backup security controls, attackers cannot directly access or tamper with recovery points, helping ensure reliable recovery when it matters most.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Policy‑based backups with long‑term retention: &lt;/STRONG&gt;Define backup schedules and retention periods using Azure Backup policies to support long‑term compliance and audit requirements.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Security‑first design:&lt;/STRONG&gt; Azure Backup safeguards vaulted backups using&lt;STRONG&gt; &lt;/STRONG&gt;encryption, soft delete, immutability, and role‑based access control, helping protect backup data against unauthorized deletion or modification.&lt;/P&gt;
&lt;H2&gt;Designed for compliance and enterprise resilience&lt;/H2&gt;
&lt;P&gt;Vaulted backups for Azure Cosmos DB help organizations align with &lt;STRONG&gt;industry and regulatory expectations&lt;/STRONG&gt; that require:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Off‑site and isolated backup copies&lt;/LI&gt;
&lt;LI&gt;Strong access controls and separation of duties&lt;/LI&gt;
&lt;LI&gt;Protection against premature deletion&lt;/LI&gt;
&lt;LI&gt;Long‑term retention of critical data&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;By integrating Cosmos DB protection into Azure Backup, customers can manage backups centrally alongside other Azure workloads using a consistent governance and monitoring experience.&lt;/P&gt;
&lt;H2&gt;Getting started with the preview&lt;/H2&gt;
&lt;P&gt;Please refer to the product documentation for details on &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/backup/backup-azure-cosmos-db-support-matrix" target="_blank" rel="noopener"&gt;supported scenarios, limitations,&lt;/A&gt; and &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/backup/backup-azure-cosmos-db" target="_blank" rel="noopener"&gt;onboarding steps&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;For Cosmos DB vaulted backup (preview), you incur charges from, 1 July 2026. Refer to Azure Backup&amp;nbsp;&lt;A class="lia-external-url" href="https://azure.microsoft.com/pricing/details/backup/" target="_blank" rel="noopener"&gt;pricing page&lt;/A&gt; and &lt;A class="lia-external-url" href="https://azure.microsoft.com/pricing/calculator/" target="_blank" rel="noopener"&gt;pricing calculator&lt;/A&gt; for more details.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jun 2026 16:14:03 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/reliability-and-resiliency-in/protect-azure-cosmos-db-with-vaulted-backups-using-azure-backup/ba-p/4522714</guid>
      <dc:creator>shobhitgarg</dc:creator>
      <dc:date>2026-06-05T16:14:03Z</dc:date>
    </item>
    <item>
      <title>Announcing Azure Infrastructure Resiliency Manager Public Preview</title>
      <link>https://techcommunity.microsoft.com/t5/reliability-and-resiliency-in/announcing-azure-infrastructure-resiliency-manager-public/ba-p/4523710</link>
      <description>&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;At Microsoft Build 2026, we are thrilled to announce that Azure Infrastructure Resiliency Manager is now available in public preview, open to all Azure customers.&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Azure Infrastructure Resiliency Manager is not a replacement for individual Azure resiliency features; it is the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;unifying layer&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;that connects them into a coherent, goal-driven workflow.&amp;nbsp;It&amp;nbsp;leverages&amp;nbsp;and complements&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;Availability Zones, Azure Advisor,&amp;nbsp;Azure&amp;nbsp;Chaos Studio, Azure Monitor, and Azure Copilot&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;, adding purposeful orchestration that turns isolated capabilities into a complete resiliency strategy. The preview already covers a broad range of Azure resource types and zone-redundant configurations, from virtual machines and databases to AKS clusters and networking with continued expansion planned.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;The new&amp;nbsp;platform&amp;nbsp;is built on a foundational belief: achieving application resilience is a&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;continuous journey&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;, not a one-time configuration task. That journey is organized into three actionable phases:&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;Start Resilient, Get Resilient, and Stay Resilient&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="none"&gt;. Each phase delivers measurable customer value such as reduced downtime risk, faster recovery, and greater operational confidence.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN class="lia-text-color-15"&gt;&lt;STRONG&gt;Start resilient:&amp;nbsp;Embedding&amp;nbsp;resiliency from&amp;nbsp;day&amp;nbsp;one&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Starting resilient means treating resiliency as a fundamental architectural requirement, not an afterthought. Azure Infrastructure Resiliency Manager makes it straightforward to design zone-resilient applications from the outset,&amp;nbsp;eliminating&amp;nbsp;costly retrofits and reducing risk before your first deployment.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 20" data-ccp-parastyle-defn="{&amp;quot;ObjectId&amp;quot;:&amp;quot;8323b0b3-1b75-565f-9c94-0f63e911d39a|1&amp;quot;,&amp;quot;ClassId&amp;quot;:1073872969,&amp;quot;Properties&amp;quot;:[201342446,&amp;quot;1&amp;quot;,201342447,&amp;quot;5&amp;quot;,201342448,&amp;quot;1&amp;quot;,201342449,&amp;quot;1&amp;quot;,469777841,&amp;quot;Aptos&amp;quot;,469777842,&amp;quot;Aptos&amp;quot;,469777843,&amp;quot;Aptos&amp;quot;,469777844,&amp;quot;Aptos&amp;quot;,201341986,&amp;quot;1&amp;quot;,469769226,&amp;quot;Aptos&amp;quot;,268442635,&amp;quot;26&amp;quot;,469775450,&amp;quot;heading 20&amp;quot;,201340122,&amp;quot;2&amp;quot;,134234082,&amp;quot;true&amp;quot;,134233614,&amp;quot;true&amp;quot;,469778129,&amp;quot;heading20&amp;quot;,335572020,&amp;quot;1&amp;quot;,134224900,&amp;quot;true&amp;quot;,335551500,&amp;quot;10706734&amp;quot;,335559739,&amp;quot;80&amp;quot;,335559738,&amp;quot;240&amp;quot;,335560102,&amp;quot;1&amp;quot;,469778324,&amp;quot;Normal&amp;quot;]}"&gt;Resiliency Agent: Your AI-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;p&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;owered&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;a&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;rchitecture&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;a&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;dvisor&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;The standout capability in this preview is the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;STRONG&gt;Resiliency Agent&lt;/STRONG&gt;,&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;a conversational, AI-powered assistant embedded directly in the Azure Portal. Designed for architects and developers, the Resiliency Agent allows teams to&amp;nbsp;validate&amp;nbsp;and refine resiliency strategies using plain language.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;For example,&amp;nbsp;you might enter a&amp;nbsp;prompt such as&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;&lt;SPAN data-contrast="none"&gt;"I'm designing a three-tier web app with VMs, a&amp;nbsp;Flexible&amp;nbsp;PostgreSQL database, and a Standard Load Balancer"&lt;/SPAN&gt;&lt;/EM&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;and ask the agent what zone-resiliency requirements apply. The Resiliency Agent analyzes your plan,&amp;nbsp;identifies&amp;nbsp;single points of failure, and recommends specific changes: enabling zone redundancy for the database, deploying VMs across zones, or upgrading to zone-redundant load balancers. It delivers a structured, per-resource summary that makes the path to resiliency explicit and actionable.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;Infrastructure-as-Code &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;g&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;eneration and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;v&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;alidation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Beyond design guidance, Infrastructure Resiliency&amp;nbsp;Manager accelerates&amp;nbsp;implementation. You can ask the Resiliency Agent to&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;generate Infrastructure-as-Code (IaC) templates (ARM, Bicep, or Terraform)&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;with all resiliency configurations pre-built and ready to deploy. A generated Bicep template, for example, automatically includes zone-redundant settings for databases, VMs, and load balancers aligned to your stated goals.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;The agent also validates&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;existing&amp;nbsp;IaC&amp;nbsp;templates&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;: upload a template and receive a natural language assessment of resiliency gaps, complete with targeted suggestions and code snippets to close them. This&amp;nbsp;eliminates&amp;nbsp;manual review overhead and ensures every new deployment starts with a resilient foundation&amp;nbsp;by&amp;nbsp;embedding resiliency into the design and deployment lifecycle from day one, organizations avoid expensive redesigns, accelerate time-to-market, and bring new services to production already meeting high-availability standards.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN class="lia-text-color-15"&gt;&lt;STRONG&gt;Get&amp;nbsp;resilient: Closing&amp;nbsp;gaps in&amp;nbsp;existing&amp;nbsp;applications&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Most Azure customers have workloads built over months or years that may not fully meet today's resiliency requirements.&amp;nbsp;&amp;nbsp;Infrastructure&amp;nbsp;Resiliency&amp;nbsp;Manager&amp;nbsp;delivers a centralized, goal-driven view of your current environment's resilience posture, along with prioritized, actionable recommendations to close every gap.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;Goal-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;d&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;riven&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;r&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;esiliency&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;p&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;osture&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Define what constitutes your application by grouping resources across regions, subscriptions, or resource groups, including tag-based grouping, using&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;Service Groups&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="none"&gt;. Once your application boundary is&amp;nbsp;established, &lt;STRONG&gt;assign a&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;resiliency goal&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="none"&gt;: for example, zone-failure tolerance for all components, or specific data replication requirements for critical services.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;The platform assesses every resource against that goal and presents a&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;clear, single-pane-of-glass resiliency posture&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;showing which resources meet the goal, which are non-resilient, and which remain unevaluated. This goal-driven model ensures that all&amp;nbsp;subsequent&amp;nbsp;guidance is precisely calibrated to your target state, not&amp;nbsp;generic&amp;nbsp;best practices.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;Actionable,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;p&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;rioritized&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;r&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;ecommendations&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;For every resource that falls short of the defined goal,&amp;nbsp;Infrastructure&amp;nbsp;Resiliency&amp;nbsp;Manager&amp;nbsp;generates&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;targeted remediation recommendations&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;STRONG&gt;&amp;nbsp;powered by&amp;nbsp;Azure Advisor&lt;/STRONG&gt;.&amp;nbsp;If a virtual machine lacks zone&amp;nbsp;redundancy, the&amp;nbsp;platform recommends&amp;nbsp;converting it to an availability&amp;nbsp;zone&amp;nbsp;deployment. If a database is not zone-redundant, the recommendation specifies exactly how to enable it.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Critically, every recommendation includes &lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;contextual decision-making information&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;: impacted resources, implementation steps, and &lt;STRONG&gt;qualitative cost indicators (High, Medium, Low) &lt;/STRONG&gt;that flag whether a fix requires&amp;nbsp;additional&amp;nbsp;service spend, downtime, or redeployment. This allows engineering teams to plan remediation in a business-informed, prioritized manner.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Looking ahead, the platform will also integrate application health with infrastructure health, correlating&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;Azure Monitor SLIs and Azure Health Model&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;insights to surface resiliency gaps with even greater precision.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;Guided&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;r&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;emediation with the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;r&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;esiliency&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;a&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;gent&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Azure Advisor&amp;nbsp;identifies&amp;nbsp;resiliency gaps and surfaces prioritized recommendations. Infrastructure Resiliency&amp;nbsp;Manager builds&amp;nbsp;on this by making those recommendations actionable.&lt;/SPAN&gt; &lt;SPAN data-contrast="none"&gt;Instead of&amp;nbsp;stopping at&amp;nbsp;insights,&amp;nbsp;the platform&amp;nbsp;provides guided&amp;nbsp;execution.&amp;nbsp;Each&amp;nbsp;recommendation includes step-by-step portal flows, dependencies, and readiness checks&amp;nbsp;required&amp;nbsp;for remediation.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;The Resiliency Agent acts as the interactive layer on top, helping&amp;nbsp;you&amp;nbsp;interpret and act on these recommendations in context.&amp;nbsp;For example,&amp;nbsp;you can&amp;nbsp;ask&amp;nbsp;whether&amp;nbsp;an App Service can be moved to zone-redundant storage, what downtime to expect, or what prerequisites are&amp;nbsp;required&amp;nbsp;and receive clear, workload-aware answers tailored to their environment.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;On request, the agent can generate &lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;remediation scripts or&amp;nbsp;IaC&amp;nbsp;snippets&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;to implement specific changes,&amp;nbsp;such&amp;nbsp;as&amp;nbsp;validating&amp;nbsp;an existing Terraform template against Azure resiliency best practices. Importantly, the agent never makes changes autonomously: it provides information and code, while you&amp;nbsp;retain&amp;nbsp;full control over execution. This human-in-the-loop model accelerates remediation without sacrificing governance.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;The result: a curated, goal-oriented to-do list that replaces generic advice with targeted action, weighted by cost and feasibility - giving engineering leaders clear visibility into which investments will yield the greatest resilience gains.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN class="lia-text-color-15"&gt;&lt;STRONG&gt;Stay&amp;nbsp;resilient: Continuous&amp;nbsp;validation and&amp;nbsp;recovery Readiness&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Resilience is not just a configuration milestone; it is an ongoing operational discipline. The&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;"Stay Resilient"&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;phase ensures the resilience&amp;nbsp;you've&amp;nbsp;built performs under pressure and that your teams are prepared to respond when real incidents occur. Azure Infrastructure Resiliency Manager delivers&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;resiliency drills and recovery orchestration&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;to support continuous readiness.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;Resiliency &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;d&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;rills&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;enabled&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;by&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;Azure&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;Chaos Studio&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;A highlight of this public preview is the introduction of&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;availability zone&amp;nbsp;failure&amp;nbsp;drills,&amp;nbsp;enabled&amp;nbsp;by Azure Chaos Studio&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="none"&gt;. These drills simulate zone outages for your application in a controlled, safe environment: shutting down VMs in a target availability zone, forcing failover for zone-redundant databases, or stopping AKS node pools. Every fault action is based on Azure-recommended patterns for each supported resource type, providing a realistic approximation of an actual&amp;nbsp;zone&amp;nbsp;failure.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Because Infrastructure Resiliency Manager understands which resources are intended to be zone-resilient, it &lt;STRONG&gt;automatically determines which fault actions&lt;/STRONG&gt; to apply, eliminating manual configuration. For scenarios not covered out of the box,&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;custom fault logic via Azure Automation runbooks&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;is supported, providing the flexibility required for complex environments.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;Recovery &lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;o&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;rchestration&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Resiliency drills&amp;nbsp;in&amp;nbsp;the&amp;nbsp;platform&amp;nbsp;go&amp;nbsp;beyond fault injection.&amp;nbsp;It integrates&amp;nbsp;with&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;recovery&amp;nbsp;plan&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;STRONG&gt;to orchestrate the complete recovery sequence&lt;/STRONG&gt; automatically after injecting faults:&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;fault injection → failover → reprotection → failback&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;. This full-cycle simulation measures the maximum potential downtime your application could experience during a zone outage and surfaces any recovery steps that did not execute as expected.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:2,&amp;quot;335551620&amp;quot;:2,&amp;quot;335559738&amp;quot;:160}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;Real-&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;t&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;ime&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;h&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;ealth&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;m&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;onitoring and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;d&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;rill&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;i&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 20"&gt;nsights&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:240,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Throughout each drill,&amp;nbsp;the Infrastructure&amp;nbsp;Resiliency&amp;nbsp;Manager&amp;nbsp;provides&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;live health monitoring&amp;nbsp;powered by Azure Monitor&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="none"&gt;.&amp;nbsp;A &lt;STRONG&gt;built-in metrics dashboard&lt;/STRONG&gt; tracks each resource's health in real time revealing whether your application&amp;nbsp;remains&amp;nbsp;available and how performance holds under simulated stress. This immediate feedback surfaces resilience gaps that may not have been visible through static analysis.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;After each drill, the platform logs the results along with team notes and attestations, building a&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;historical record of all resilience tests&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;. Over time, this record&amp;nbsp;demonstrates&amp;nbsp;measurable improvement and supports compliance with organizational and regulatory resiliency requirements.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;"Stay Resilient" converts assumptions into evidence. When an actual zone outage occurs, your teams will not be executing a failover for the first time; they would have rehearsed it. The result is a culture of proactive resilience, and the organizational confidence that your systems will deliver on their availability commitments.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 10"&gt;Get&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 10"&gt;s&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 10"&gt;tarted with the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 10"&gt;p&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 10"&gt;ublic&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 10"&gt;p&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 10"&gt;review&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:320,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;Starting today, the&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;public preview of Azure Infrastructure Resiliency Manager is open to all Azure customers&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="none"&gt;. Access&amp;nbsp;the new&amp;nbsp;platform through&amp;nbsp;the Azure Portal by searching for&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;"Resiliency"&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;. We encourage you to evaluate&amp;nbsp;it against&amp;nbsp;a test application or a production workload to gain immediate visibility into your current resiliency posture.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;To get the most from Infrastructure Resiliency&amp;nbsp;Manager,&amp;nbsp;we recommend these three starting actions:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI aria-setsize="-1" data-leveltext="•" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;•&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="none"&gt;Define a resiliency goal&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;for a critical application and review the posture&amp;nbsp;insights&amp;nbsp;the platform&amp;nbsp;surfaces; you may uncover gaps that were previously invisible.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:60,&amp;quot;335559739&amp;quot;:60}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI aria-setsize="-1" data-leveltext="•" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;•&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="none"&gt;Engage the Resiliency Agent&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;to tackle a few recommendations and experience firsthand how AI-guided remediation accelerates your team's workflow.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:60,&amp;quot;335559739&amp;quot;:60}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI aria-setsize="-1" data-leveltext="•" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;•&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="none"&gt;Run a zone-down drill&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;in a non-production environment to&amp;nbsp;validate&amp;nbsp;your failover and recovery processes under realistic conditions.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:60,&amp;quot;335559739&amp;quot;:60}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;We believe this holistic approach will help organizations achieve a new level of operational excellence, making resiliency actionable, measurable, and deeply embedded in cloud practices. As Infrastructure Resiliency Manager moves toward general availability, we will continue incorporating your feedback and expanding capabilities to meet the demands of real-world cloud architectures.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-contrast="none"&gt;Azure Infrastructure Resiliency Manager gives you the tools to reduce downtime risk, gain clarity over your resiliency posture, and build genuine readiness for the unexpected.&amp;nbsp;&lt;/SPAN&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/resiliency/" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;Join the public preview today&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="none"&gt;&amp;nbsp;and take the next step toward applications that&amp;nbsp;don't&amp;nbsp;just survive&amp;nbsp;disruptions;&amp;nbsp;they thrive through them.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:6,&amp;quot;335551620&amp;quot;:6,&amp;quot;335559738&amp;quot;:80,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 10"&gt;Resources&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:320,&amp;quot;335559739&amp;quot;:120}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI aria-setsize="-1" data-leveltext="•" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;•&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="4" data-aria-level="1"&gt;&lt;A href="https://aka.ms/Azure-Infrastructure-Resiliency-Manager" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Azure Infrastructure Resiliency Manager — Overview&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:60,&amp;quot;335559739&amp;quot;:60}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI aria-setsize="-1" data-leveltext="•" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;•&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="5" data-aria-level="1"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/governance/service-groups/overview" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Get Started with Service Groups — Microsoft Learn&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:60,&amp;quot;335559739&amp;quot;:60}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI aria-setsize="-1" data-leveltext="•" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;•&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="6" data-aria-level="1"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/advisor/advisor-overview" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Introduction to Azure Advisor — Microsoft Learn&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:60,&amp;quot;335559739&amp;quot;:60}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI aria-setsize="-1" data-leveltext="•" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;•&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="7" data-aria-level="1"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/chaos-studio/chaos-studio-overview" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;What is Azure Chaos Studio? — Microsoft Learn&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:60,&amp;quot;335559739&amp;quot;:60}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI aria-setsize="-1" data-leveltext="•" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;•&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="8" data-aria-level="1"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/azure-monitor/fundamentals/whats-new" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;What's New in Azure Monitor — Microsoft Learn&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:60,&amp;quot;335559739&amp;quot;:60}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI aria-setsize="-1" data-leveltext="•" data-font="Symbol" data-listid="4" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559683&amp;quot;:0,&amp;quot;335559684&amp;quot;:-2,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;•&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="9" data-aria-level="1"&gt;&lt;A href="https://techcommunity.microsoft.com/blog/reliability-and-resiliency-in-azure/modern-azure-resilience-with-mark-russinovich/4508967" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Modern Azure Resilience with Mark Russinovich — Tech Community&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-ccp-props="{&amp;quot;335559738&amp;quot;:60,&amp;quot;335559739&amp;quot;:60}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;&lt;SPAN data-ccp-props="{&amp;quot;335551550&amp;quot;:2,&amp;quot;335551620&amp;quot;:2}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jun 2026 19:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/reliability-and-resiliency-in/announcing-azure-infrastructure-resiliency-manager-public/ba-p/4523710</guid>
      <dc:creator>rochakm</dc:creator>
      <dc:date>2026-06-02T19:00:00Z</dc:date>
    </item>
    <item>
      <title>Proactive Reliability Series — Article 1: Fault Types in Azure</title>
      <link>https://techcommunity.microsoft.com/t5/reliability-and-resiliency-in/proactive-reliability-series-article-1-fault-types-in-azure/ba-p/4515521</link>
      <description>&lt;P data-line="4"&gt;Welcome to the&amp;nbsp;&lt;STRONG&gt;Proactive Reliability Series&lt;/STRONG&gt;&amp;nbsp;— a collection of articles dedicated to raising awareness about the importance of&amp;nbsp;&lt;STRONG&gt;designing&lt;/STRONG&gt;,&amp;nbsp;&lt;STRONG&gt;implementing&lt;/STRONG&gt;, and&amp;nbsp;&lt;STRONG&gt;operating&lt;/STRONG&gt;&amp;nbsp;reliable solutions in Azure. Each article will focus on a specific area of reliability engineering: from identifying critical flows and setting reliability targets, to designing for redundancy, testing strategies, and disaster recovery.&lt;/P&gt;
&lt;P data-line="6"&gt;This series draws its foundation from the&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/"&gt;Reliability pillar of the Azure Well-Architected Framework&lt;/A&gt;, Microsoft's authoritative guidance for building workloads that are resilient to malfunction and capable of returning to a fully functioning state after a failure occurs.&lt;/P&gt;
&lt;P data-line="8"&gt;In the cloud, failures are not a matter of&amp;nbsp;&lt;EM&gt;if&lt;/EM&gt;&amp;nbsp;but&amp;nbsp;&lt;EM&gt;when&lt;/EM&gt;. Whether it is a regional outage, an availability zone going dark, a misconfigured resource, or a downstream service experiencing degradation — your workload will eventually face adverse conditions. The difference between a minor blip and a major incident often comes down to how deliberately you have planned for failure.&lt;/P&gt;
&lt;P data-line="10"&gt;In this first article, we start with one of the most foundational practices:&amp;nbsp;&lt;STRONG&gt;Fault Mode Analysis (FMA)&lt;/STRONG&gt;&amp;nbsp;— and the question that underpins it:&amp;nbsp;&lt;EM&gt;what kinds of faults can actually happen in Azure?&lt;/EM&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="12"&gt;&lt;EM&gt;&lt;STRONG&gt;Disclaimer&lt;/STRONG&gt;&lt;/EM&gt;: The views expressed in this article are my own and do not represent the views or positions of Microsoft. This article is written in a personal capacity and has not been reviewed, endorsed, or approved by Microsoft.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;H2 data-line="16"&gt;Why Fault Mode Analysis Matters&lt;/H2&gt;
&lt;P data-line="18"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/failure-mode-analysis" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/failure-mode-analysis"&gt;Fault Mode Analysis&lt;/A&gt;&amp;nbsp;is the practice of systematically identifying potential points of failure within your workload and its associated flows, and then planning mitigation actions accordingly. A key tenet of FMA is that&amp;nbsp;&lt;STRONG&gt;in any distributed system, failures can occur regardless of how many layers of resiliency are applied&lt;/STRONG&gt;. More complex environments are simply exposed to more types of failures. Given this reality, FMA allows you to design your workload to withstand most types of failures and recover gracefully within defined recovery objectives.&lt;/P&gt;
&lt;P data-line="20"&gt;If you skip FMA altogether, or perform an incomplete analysis, your workload is at risk of unpredicted behavior and potential outages caused by suboptimal design.&lt;/P&gt;
&lt;P data-line="22"&gt;But to perform FMA effectively, you first need to understand&amp;nbsp;&lt;STRONG&gt;what kinds of faults can actually occur&lt;/STRONG&gt;&amp;nbsp;in Azure infrastructure — and that is where most teams hit a gap.&lt;/P&gt;
&lt;H2 data-line="26"&gt;Sample "Azure Fault Type" Taxonomy&lt;/H2&gt;
&lt;P data-line="28"&gt;Azure infrastructure is complex and distributed, and while Microsoft invests heavily in reliability, faults can and do occur. These faults can range from large-scale global service outages to localized issues affecting a single VM.&lt;/P&gt;
&lt;P data-line="30"&gt;The following is a&amp;nbsp;&lt;STRONG&gt;sample&lt;/STRONG&gt;&amp;nbsp;taxonomy of common Azure infrastructure fault types, categorized by their characteristics, likelihood, and mitigation strategies. The taxonomy is organized from a&amp;nbsp;&lt;STRONG&gt;customer impact perspective&lt;/STRONG&gt;&amp;nbsp;— focusing on how fault types affect customer workloads and what mitigation options are available — rather than from an internal Azure engineering perspective.&lt;/P&gt;
&lt;P data-line="32"&gt;Some of these "faults" may not even be caused by an actual failure in Azure infrastructure. They can be caused by a lack of understanding of Azure service designed behaviors (e.g., underestimating the impact of Azure planned maintenance) or by Azure platform design decisions (e.g., capacity constraints). However, from a customer perspective, they all represent potential failure modes that need to be considered and mitigated when designing for reliability.&lt;/P&gt;
&lt;P data-line="34"&gt;The following table presents infrastructure fault types from a customer impact perspective:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P data-line="36"&gt;&lt;EM&gt;&lt;STRONG&gt;Disclaimer&lt;/STRONG&gt;&lt;/EM&gt;: This is an unofficial taxonomy sample of Azure infrastructure fault types. It is not an official Microsoft publication and is not officially supported, endorsed, or maintained by Microsoft. The fault type definitions, likelihood assessments, and mitigation recommendations are based on publicly available Azure documentation and general cloud architecture best practices, but may not reflect the most current Azure platform behavior. Always refer to official&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/"&gt;Azure documentation&lt;/A&gt;&amp;nbsp;and&amp;nbsp;&lt;A href="https://azure.status.microsoft/" target="_blank" rel="noopener" data-href="https://azure.status.microsoft/"&gt;Azure Service Health&lt;/A&gt;&amp;nbsp;for authoritative guidance.&lt;/P&gt;
&lt;P data-line="38"&gt;The "&lt;EM&gt;&lt;STRONG&gt;Likelihood&lt;/STRONG&gt;&lt;/EM&gt;" values below are&amp;nbsp;&lt;EM&gt;relative planning heuristics&lt;/EM&gt;&amp;nbsp;intended to help prioritize resilience investments. They are not statistical probabilities, do not represent Azure SLA commitments, and are not derived from official Azure reliability data.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-background-color-16" border="1" style="width: 99.4444%; height: 447px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr class="lia-background-color-17 lia-border-color-21" style="height: 35px;"&gt;&lt;th style="height: 35px;"&gt;&lt;STRONG&gt;Fault Type&lt;/STRONG&gt;&lt;/th&gt;&lt;th style="height: 35px;"&gt;&lt;STRONG&gt;Blast Radius&lt;/STRONG&gt;&lt;/th&gt;&lt;th style="height: 35px;"&gt;&lt;STRONG&gt;Likelihood&lt;/STRONG&gt;&lt;/th&gt;&lt;th style="height: 35px;"&gt;&lt;STRONG&gt;Mitigation Redundancy Level Requirements&lt;/STRONG&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Service Fault (Global)&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;Worldwide or Multiple Regions&lt;/td&gt;&lt;td style="height: 35px;"&gt;Very Low&lt;/td&gt;&lt;td style="height: 35px;"&gt;High&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Service Fault (Region)&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;Single service in region&lt;/td&gt;&lt;td style="height: 35px;"&gt;Medium&lt;/td&gt;&lt;td style="height: 35px;"&gt;Region Redundancy&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Region Fault&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;Single region&lt;/td&gt;&lt;td style="height: 35px;"&gt;Very Low&lt;/td&gt;&lt;td style="height: 35px;"&gt;Region Redundancy&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;&lt;A class="lia-internal-link" href="#community--1-partial-region-fault" target="_blank" rel="noopener" data-href="#deep-dive-partial-region-fault" data-lia-auto-title="Partial Region Fault" data-lia-auto-title-active="0"&gt;Partial Region Fault&lt;/A&gt;&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Multiple services in a single Region&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Low&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Region Redundancy&lt;/STRONG&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Availability Zone Fault&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;Single AZ within region&lt;/td&gt;&lt;td style="height: 35px;"&gt;Low&lt;/td&gt;&lt;td style="height: 35px;"&gt;Availability Zone Redundancy&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Single Resource Fault&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;Single VM/instance&lt;/td&gt;&lt;td style="height: 35px;"&gt;High&lt;/td&gt;&lt;td style="height: 35px;"&gt;Resource Redundancy&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Platform Maintenance Fault&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;Variable (resource to region)&lt;/td&gt;&lt;td style="height: 35px;"&gt;High&lt;/td&gt;&lt;td style="height: 35px;"&gt;Resource Redundancy, Maintenance Schedules&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Region Capacity Constraint Fault&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;Single region&lt;/td&gt;&lt;td style="height: 35px;"&gt;Low&lt;/td&gt;&lt;td style="height: 35px;"&gt;Region Redundancy, Capacity Reservations&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 35px;"&gt;&lt;td style="height: 35px;"&gt;&lt;STRONG&gt;Network POP Location Fault&lt;/STRONG&gt;&lt;/td&gt;&lt;td style="height: 35px;"&gt;Network hardware Colocation site&lt;/td&gt;&lt;td style="height: 35px;"&gt;Low&lt;/td&gt;&lt;td style="height: 35px;"&gt;Site Redundancy&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 24.977%" /&gt;&lt;col style="width: 24.977%" /&gt;&lt;col style="width: 24.977%" /&gt;&lt;col style="width: 24.977%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="52"&gt;In future articles we will examine each of these fault types in detail. For this first article, let's take a closer look at one that is often underestimated: the&amp;nbsp;&lt;STRONG&gt;Partial Region Fault&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P data-line="52"&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2 class="lia-linked-item" data-line="56"&gt;&lt;a id="community--1-partial-region-fault" class="lia-anchor"&gt;&lt;/a&gt;Deep Dive: "Partial Region Fault"&lt;/H2&gt;
&lt;img /&gt;
&lt;P data-line="60"&gt;A &lt;STRONG&gt;Partial Region Fault&lt;/STRONG&gt;&amp;nbsp;is a fault affecting multiple Azure services within a single region simultaneously, typically due to shared regional infrastructure dependencies, regional network issues, or regional platform incidents. Sometimes, the number of affected services may be significant enough to resemble a full region outage — but the key distinction is that it is not a complete loss of the region. Some services may continue to operate normally, while others experience degradation or unavailability. Unlike Natural Disaster caused Region outage, in the documented cases referenced later in this article, such "Partial Region Faults" have historically been resolved within hours.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-background-color-16" border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr class="lia-background-color-17"&gt;&lt;th&gt;Attribute&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Blast Radius&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Multiple services within a single region&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Likelihood&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Low&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Typical Duration&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Minutes to hours&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Fault Tolerance Options&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Multi-region architecture; cross-region failover&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Fault Tolerance Cost&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;High&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Impact&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Severe&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Typical Cause&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Regional networking infrastructure failure affecting multiple services, regional storage subsystem degradation impacting dependent services, regional control plane issues affecting service management&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="73"&gt;These faults are rare, but they can happen — and when they do, they can have a severe impact on customer solutions that are not architected for multi-region resilience.&lt;/P&gt;
&lt;P data-line="75"&gt;What makes Partial Region Faults particularly dangerous is that they fall into a blind spot in most teams' resilience planning. When organizations think about regional failures, they tend to think in binary terms: either a region is up or it is down. Disaster recovery runbooks are written around the idea of a full region outage — triggered by a natural disaster or a catastrophic infrastructure event — where the response is to fail over everything to a secondary region.&lt;/P&gt;
&lt;P data-line="77"&gt;But a Partial Region Fault is not a full region outage. It is something more insidious. A subset of services in the region degrades or becomes unavailable while others continue to function normally. Your VMs might still be running, but the networking layer that connects them is broken. Your compute is fine, but Azure Resource Manager — the control plane through which you manage everything — is unreachable.&lt;/P&gt;
&lt;P data-line="79"&gt;This partial nature creates several problems that teams rarely plan for:&lt;/P&gt;
&lt;UL data-line="81"&gt;
&lt;LI data-line="81"&gt;&lt;STRONG&gt;Failover logic may not trigger.&lt;/STRONG&gt;&amp;nbsp;Most automated failover mechanisms are designed to detect a complete loss of connectivity to a region. When only some services are affected, health probes may still pass, traffic managers may still route requests to the degraded region, and your failover automation may sit idle — while your users are already experiencing errors.&lt;/LI&gt;
&lt;LI data-line="83"&gt;&lt;STRONG&gt;Recovery is more complex.&lt;/STRONG&gt;&amp;nbsp;With a full region outage, the playbook is straightforward: fail over to the secondary region. With a partial fault, you may need to selectively fail over some services while others remain in the primary region — a scenario that few teams have tested and most architectures do not support gracefully.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="85"&gt;The&amp;nbsp;&lt;STRONG&gt;real-world examples&lt;/STRONG&gt;&amp;nbsp;below illustrate this clearly. In each case, a shared infrastructure dependency — regional networking, Managed Identities, or Azure Resource Manager — experienced an issue that cascaded into a multi-service fault lasting hours. None of these were full region outages, yet the scope and duration of affected services was significant in each case:&lt;/P&gt;
&lt;H3 data-line="88"&gt;&lt;STRONG&gt;Switzerland North — Network Connectivity Impact (BT6W-FX0)&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P data-line="90"&gt;A platform issue resulted in an impact to customers in Switzerland North who may have experienced service availability issues for resources hosted in the region.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-background-color-16" border="1" style="width: 74.1667%; height: 207px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr class="lia-background-color-17"&gt;&lt;th&gt;Attribute&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Date&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;September 26–27, 2025&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Region&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Switzerland North&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Time Window&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;23:54 UTC on 26 Sep – 21:59 UTC on 27 Sep 2025&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Total Duration&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;~22 hours&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Services Impacted&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Multiple (network-dependent services in the region)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="100"&gt;According to the official Post Incident Review (PIR) published by Microsoft on Azure Status History, a platform issue caused network connectivity degradation affecting multiple network-dependent services across the Switzerland North region, with impact lasting approximately 22 hours. The full root cause analysis, timeline, and remediation steps are documented in the linked PIR below.&lt;/P&gt;
&lt;P data-line="103"&gt;🔗&amp;nbsp;&lt;A href="https://azure.status.microsoft/en-us/status/history/?trackingid=BT6W-FX0" target="_blank" rel="noopener" data-href="https://azure.status.microsoft/en-us/status/history/?trackingid=BT6W-FX0"&gt;View PIR on Azure Status History&lt;/A&gt;&lt;/P&gt;
&lt;H3 data-line="105"&gt;&lt;STRONG&gt;East US and West US — Managed Identities and Dependent Services (_M5B-9RZ)&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P data-line="107"&gt;A platform issue with the Managed Identities for Azure resources service impacted customers trying to create, update, or delete Azure resources, or acquire Managed Identity tokens in East US and West US regions.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-background-color-16" border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr class="lia-background-color-17"&gt;&lt;th&gt;Attribute&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Date&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;February 3, 2026&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Regions&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;East US, West US&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Time Window&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;00:10 UTC – 06:05 UTC on 03 February 2026&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Total Duration&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;~6 hours&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Services Impacted&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Managed Identities + dependent services (resource create/update/delete, token acquisition)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="118"&gt;🔗&amp;nbsp;&lt;A href="https://azure.status.microsoft/en-us/status/history/?trackingid=_M5B-9RZ" target="_blank" rel="noopener" data-href="https://azure.status.microsoft/en-us/status/history/?trackingid=_M5B-9RZ"&gt;View PIR on Azure Status History&lt;/A&gt;&lt;/P&gt;
&lt;H3 data-line="120"&gt;&lt;STRONG&gt;Azure Government — Azure Resource Manager Failures (ML7_-DWG)&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P data-line="122"&gt;Customers using any Azure Government region experienced failures when attempting to perform service management operations through Azure Resource Manager (ARM). This included operations through the Azure Portal, Azure REST APIs, Azure PowerShell, and Azure CLI.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table class="lia-background-color-16" border="1" style="width: 75%; height: 199px; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr class="lia-background-color-17"&gt;&lt;th&gt;Attribute&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Date&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;December 8, 2025&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Regions&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;Azure Government (all regions)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Time Window&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;11:04 EST (16:04 UTC) – 14:13 EST (19:13 UTC)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Total Duration&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;~3 hours&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;Services Impacted&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;20+ services (ARM and all ARM-dependent services)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="133"&gt;🔗&amp;nbsp;&lt;A href="https://azure.status.microsoft/en-us/status/history/?trackingid=ML7_-DWG" target="_blank" rel="noopener" data-href="https://azure.status.microsoft/en-us/status/history/?trackingid=ML7_-DWG"&gt;View PIR on Azure Status History&lt;/A&gt;&lt;/P&gt;
&lt;H2 data-line="137"&gt;Wrapping Up&lt;/H2&gt;
&lt;P data-line="139"&gt;Designing resilient Azure solutions requires understanding the full spectrum of potential infrastructure faults. The Partial Region Fault is just one of many fault types you should account for during your Failure Mode Analysis — but it is a powerful reminder that even within a single region, shared infrastructure dependencies can amplify a single failure into a multi-service outage.&lt;/P&gt;
&lt;P data-line="141"&gt;Use this taxonomy as a starting point for FMA when designing your Azure architecture. The area is continuously evolving as the Azure platform and industry evolve — watch the space and revisit your fault type analysis periodically.&lt;/P&gt;
&lt;P data-line="143"&gt;In the next article, we will continue exploring additional fault types from the taxonomy. Stay tuned.&lt;/P&gt;
&lt;H2 data-line="147"&gt;Authors &amp;amp; Reviewers&lt;/H2&gt;
&lt;P data-line="149"&gt;&lt;STRONG&gt;Authored by&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://www.linkedin.com/in/zoranjovanovic/" target="_blank" rel="noopener" data-href="https://www.linkedin.com/in/zoranjovanovic/"&gt;Zoran Jovanovic&lt;/A&gt;, Cloud Solutions Architect at Microsoft.&lt;BR /&gt;&lt;STRONG&gt;Peer Review by&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://www.linkedin.com/in/catalina-alupoaie/" target="_blank" rel="noopener" data-href="https://www.linkedin.com/in/catalina-alupoaie/"&gt;Catalina Alupoaie&lt;/A&gt;, Cloud Solutions Architect at Microsoft.&lt;BR /&gt;&lt;STRONG&gt;Peer Review by&lt;/STRONG&gt;&amp;nbsp;&lt;A href="https://www.linkedin.com/in/stefanjohner/" target="_blank" rel="noopener" data-href="https://www.linkedin.com/in/stefanjohner/"&gt;Stefan Johner&lt;/A&gt;, Cloud Solutions Architect at Microsoft.&lt;/P&gt;
&lt;H2 data-line="154"&gt;References&lt;/H2&gt;
&lt;UL data-line="156"&gt;
&lt;LI data-line="156"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/"&gt;Azure Well-Architected Framework — Reliability Pillar&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="157"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/failure-mode-analysis" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/failure-mode-analysis"&gt;Failure Mode Analysis&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="158"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/reliability/concept-shared-responsibility" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/reliability/concept-shared-responsibility"&gt;Shared Responsibility for Reliability&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="159"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/reliability/availability-zones-overview" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/reliability/availability-zones-overview"&gt;Azure Availability Zones&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="160"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/reliability/concept-business-continuity-high-availability-disaster-recovery" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/reliability/concept-business-continuity-high-availability-disaster-recovery"&gt;Business Continuity and Disaster Recovery&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="161"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/architecture/best-practices/transient-faults" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/architecture/best-practices/transient-faults"&gt;Transient Fault Handling&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="162"&gt;&lt;A href="https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services" target="_blank" rel="noopener" data-href="https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services"&gt;Azure Service Level Agreements&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="163"&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/reliability/overview-reliability-guidance" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/reliability/overview-reliability-guidance"&gt;Azure Reliability Guidance by Service&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-line="164"&gt;&lt;A href="https://azure.status.microsoft/status/history/" target="_blank" rel="noopener" data-href="https://azure.status.microsoft/status/history/"&gt;Azure Status History&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 29 Apr 2026 18:09:31 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/reliability-and-resiliency-in/proactive-reliability-series-article-1-fault-types-in-azure/ba-p/4515521</guid>
      <dc:creator>Zoran Jovanovic</dc:creator>
      <dc:date>2026-04-29T18:09:31Z</dc:date>
    </item>
    <item>
      <title>Modern Azure Resilience with Mark Russinovich</title>
      <link>https://techcommunity.microsoft.com/t5/reliability-and-resiliency-in/modern-azure-resilience-with-mark-russinovich/ba-p/4508967</link>
      <description>&lt;P class="lia-align-justify"&gt;&lt;A href="https://azure.microsoft.com/en-us/blog/azure-reliability-resiliency-and-recoverability-build-continuity-by-design/" target="_blank" rel="noopener"&gt;Resiliency&lt;/A&gt; in the cloud reflects different priorities from consistent performance, to withstanding failures, to predictable recovery. These map to reliability, resiliency, and recoverability, which together guide how workloads should be designed on Azure. This post extends foundational guidance with practical multi‑region design decisions, including when to use availability zones, paired regions, and non‑paired regions to meet business continuity goals.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Reliability in Azure isn’t defined by a single recommendation, but by a set of architectural patterns designed to balance cost, complexity, recovery speed, and operational effort—because no single approach fits every workload. While disaster recovery is a common driver for multi‑region designs, long‑term scale planning also matters. Azure regions operate within defined physical and latency boundaries, and large-scale workloads may eventually approach the practical capacity limits of a single region.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;This post introduces four resilience patterns, outlining when and why to use each so you can assess options based on your non‑functional requirements. It also explains how &lt;A href="https://learn.microsoft.com/en-us/azure/reliability/availability-zones-overview" target="_blank" rel="noopener"&gt;availability zone–based designs&lt;/A&gt; can often provide an alternative to &lt;A href="https://learn.microsoft.com/en-us/azure/reliability/regions-paired" target="_blank" rel="noopener"&gt;paired regions&lt;/A&gt; as a default choice.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Here are a few common reliability and availability architecture patterns:&lt;/P&gt;
&lt;OL class="lia-align-justify"&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG&gt;In-region High Availability (HA) with Availability Zones (AZ)&lt;/STRONG&gt;: Maximize availability within a single Azure region by deploying across multiple &lt;A href="https://learn.microsoft.com/en-us/azure/reliability/availability-zones-overview" target="_blank" rel="noopener"&gt;availability zones&lt;/A&gt;.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG&gt;Regional Business Continuity and Disaster Recovery (BCDR)&lt;/STRONG&gt;: A primary/secondary region strategy implemented across separate Azure regions, selected based on geographic risk boundaries, regulatory requirements, and service availability. Recovery sequencing and failover behaviors are defined by workload dependencies and organizational requirements.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG&gt;Non-paired region BCDR&lt;/STRONG&gt;: A primary/secondary region strategy where the secondary region is chosen based on requirements such as capacity, service availability, data residency, and network latency. This approach also supports long‑term scale planning, since Azure regions operate within physical datacenter footprints and latency boundaries and can reach practical capacity limits as workloads grow. See &lt;A href="https://learn.microsoft.com/en-us/azure/reliability/regions-multi-region-nonpaired" target="_blank" rel="noopener"&gt;multi‑region solutions in non‑paired regions&lt;/A&gt;.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;STRONG&gt;Multi-region active/active&lt;/STRONG&gt;: Deploy workloads across multiple regions simultaneously so that each region can serve production traffic. This approach can provide both high availability and disaster resilience while improving global performance, but it introduces additional architectural complexity and operational overhead.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P class="lia-align-justify"&gt;The rest of this post helps you understand the tradeoffs across these patterns, enabling you to select the right approach per workload while avoiding unnecessary cost and operational complexity.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;First post in this series: &lt;A href="https://azure.microsoft.com/en-us/blog/achieve-agility-and-scale-in-a-dynamic-cloud-world/" target="_blank" rel="noopener"&gt;Achieve agility and scale in a dynamic cloud world&lt;/A&gt;&lt;/P&gt;
&lt;H2&gt;Why did Azure launch with paired regions?&lt;/H2&gt;
&lt;P class="lia-align-justify"&gt;Launched in 2010, but rebranded to Microsoft Azure in 2014, the regions were introduced in pairs (West US &amp;amp; East US, West Europe &amp;amp; North Europe, Southeast Asia &amp;amp; East Asia) to align with common enterprise business continuity practices at the time. Many organizations operated multiple datacenters within the same geographic boundary, separated by sufficient distance to reduce shared risk while maintaining regulatory and operational alignment. This design mirrored familiar enterprise BCDR practices at the time and offered:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="lia-align-justify"&gt;A familiar primary/secondary failover pattern consistent with enterprise BCDR strategies&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;Support for regulatory or data residency requirements that required disaster recovery within a defined geographic boundary&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;Turnkey replication capabilities for services such as &lt;A href="https://learn.microsoft.com/azure/storage/common/storage-redundancy#geo-redundant-storage" target="_blank" rel="noopener"&gt;Geo-Redundant Storage (GRS)&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;
&lt;P&gt;Platform-level sequencing of updates to reduce the likelihood of simultaneous regional impact&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P class="lia-align-justify"&gt;A defined regional recovery prioritization model for rare geography-wide incidents&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This model provided assurance that Azure could meet or exceed the resilience of legacy enterprise environments while simplifying early cloud adoption through predefined recovery patterns.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;However, Azure’s engineering strategy has evolved. Many services now support replication to a region of choice rather than being limited to predefined pairs. This provides architects with greater flexibility to select regions based on workload requirements, risk boundaries, compliance constraints, capacity considerations, and cost models.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;It’s important to recognize that regional parity is never guaranteed even between paired regions. Differences in service availability, &lt;A href="https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/table" target="_blank" rel="noopener"&gt;supported SKUs, scale limits, capacity, cost and operational maturity must be explicitly accounted for in the workload design.&lt;/A&gt;&lt;/P&gt;
&lt;H2&gt;How has cloud resilience evolved since launch?&lt;/H2&gt;
&lt;P class="lia-align-justify"&gt;The introduction of &lt;A href="https://learn.microsoft.com/en-us/azure/reliability/availability-zones-overview?tabs=azure-cli" target="_blank" rel="noopener"&gt;Availability Zones&lt;/A&gt; in 2018 provides a significant advancement in Azure resilience. Availability Zones are physically isolated groups of data centers within a region; each zone has independent power, cooling and networking. Many Azure services (App Service, Storage, Azure SQL etc.) use zones to provide platform-managed resilience. In addition, customers can deploy zonal resources, such as virtual machines, into specific zones or distribute them across zones to design for higher availability.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Where previously Azure regions were launched in pairs, since 2020, regions have been typically designed with multiple availability zones, without a paired region. This design enables:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;High availability within a single region&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;Platform-managed resilience for most failure scenarios&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;Reduced need for multi-region deployments for standard high-availability requirements&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;How should customers design for resilience when using both paired and non-paired regions?&lt;/H2&gt;
&lt;P class="lia-align-justify"&gt;To decide which resiliency model makes sense, customers should start by defining clear expectations including uptime targets, recovery time objectives (RTO), recovery point objectives (RPO), latency tolerance, and data residency. These non-functional requirements should directly influence architectural decisions.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;In practice, High Availability (HA) and Disaster Recovery (DR) are differentiated by recovery objectives rather than geography. HA architectures target near-zero downtime and minimal data loss, while DR solutions allow for defined recovery time and acceptable data loss.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;While HA is commonly established within a region using availability zones, it can also be achieved across regions through active-active designs. Similarly, DR is typically implemented across regions using replication and failover strategies.&lt;/P&gt;
&lt;H4&gt;HA: Availability Zones&lt;/H4&gt;
&lt;P class="lia-align-justify"&gt;When designing high availability within a region, Azure builds on AZs with 2 models:&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Zone-redundant resources&lt;/STRONG&gt; are replicated across multiple availability zones to ensure data remains accessible even if one zone fails. Some services provide built-in zone redundancy, while others require manual configuration. Typically, Microsoft chooses the zones used for your resources, though some services allow you to select them.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;Zonal resources are deployed in a single availability zone and do not provide automatic resiliency against zone outages. While faults in other zones do not affect them, ensuring resiliency requires deploying separate resources across multiple zones. Microsoft does not handle this process; you are responsible for managing failover if an outage occurs.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P class="lia-align-justify"&gt;The decision to design a zone-resilient architecture is critical for balancing availability requirements with cost and regional capacity constraints. Designing workloads to be resilient across availability zones is generally the preferred approach for improving availability and protecting against zone-level failures. Deploying workloads across availability zones can enhance fault tolerance and reduce downtime when supported by the Azure service being used. However, architects should still consider workload characteristics, cost implications, and potential latency impacts, which may vary depending on the services and architecture patterns involved.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Ultimately, zone resiliency is an architectural decision that should be strategically aligned with business priorities and risk tolerance, not simply treated as a checkbox to be ticked during deployment.&lt;/P&gt;
&lt;H4&gt;DR: Paired and Non-Paired Regions&lt;/H4&gt;
&lt;P class="lia-align-justify"&gt;Region pairs should be viewed as an architectural choice rather than a rule. Historically, paired regions played a key role in minimizing correlated failures and streamlining platform updates and recovery processes. However, as the Azure &lt;A href="https://azure.microsoft.com/en-us/blog/advancing-safe-deployment-practices/?msockid=2f9e0a1921a66bf21a0b1ee8201c6a6d" target="_blank" rel="noopener"&gt;Safe Deployment Practices (SDP)&lt;/A&gt; have matured, the advantages of region pairs have become more nuanced. Over time, SDP has evolved to support safer and more flexible change management through longer and more adaptable bake times, richer operational signal integration, and an expanded understanding of regional deployment boundaries. These improvements enable Azure to release changes more safely across a growing and increasingly diverse regional footprint, while still balancing reliability with time‑to‑market. As a result, regional pairs are no longer the sole mechanism for managing correlated change risk, but one of several architectural tools customers can apply based on their resiliency and compliance needs.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Using non-paired regions or a mix of paired and non-paired regions allows customers to design high availability and disaster recovery architectures that are driven by business, compliance, and application requirements rather than fixed regional relationships. This enables customers to optimize data residency, regulatory boundaries, latency to specific user populations, and provide differentiated recovery objectives across their workloads. This approach can also reduce exposure to rare but high-impact platform-level events by avoiding tightly coupled regional behaviors.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;While some Azure services natively simplify replication and recovery within paired regions, and others support replication across arbitrary regions (such as Azure SQL, Cosmos DB, and Azure Blob Storage with object replication), non-paired designs encourage explicit, workload-aware resiliency strategies such as application-level replication, asynchronous data sync, and failover orchestration. Although this introduces more architectural responsibility and may require compensating for paired region features, it delivers greater transparency, predictable recovery behavior, and alignment with business-driven RTO/RPO requirements rather than platform defaults. Regional failover is a customer‑orchestrated decision; customers should design, test, and operate their own failover and failback processes rather than assuming platform‑initiated regional failover.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Designing for regional resilience requires distinguishing between workload mobility and data protection. Azure provides two complementary capabilities that address these needs differently: Azure Site Recovery (ASR) and Azure Backup.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Azure Site Recovery (ASR) enables near‑continuous replication and orchestrated failover of virtual machine–based workloads to a region of choice, not limited to paired regions. ASR is the primary mechanism for customers who need low RPO, controlled failover, and workload restart in a secondary region. This is especially relevant for regions without a paired region or where the paired region does not meet capacity, service availability, or compliance needs.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;Azure Backup provides durable, policy‑based data protection, independent of compute availability. While Azure Backup is not a high‑availability or infrastructure failover solution, it plays a critical role when services do not support region‑of‑choice replication natively. In these scenarios, backup and restore become the recovery mechanism.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;These two services are often used together: ASR for VM‑level workload continuity, and Azure Backup for protecting and restoring data across regions, including to non‑paired regions.&lt;/P&gt;
&lt;H2&gt;I am using paired regions today – does this mean I need to change my architecture?&lt;/H2&gt;
&lt;P class="lia-align-justify"&gt;If your current architecture is built around paired regions for compliance, data residency, or strict disaster recovery objectives, that model stays valid and supported. Azure continues to support paired regions providing prioritized recovery sequencing, staggered platform updates, and geo-aligned data residency, all backed by Microsoft’s global infrastructure strategy.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;What has changed is that paired regions are no longer the only way to achieve enterprise-grade resilience. For many workloads that adopted a paired region (1+1) model primarily to protect against local datacenter failure, Availability Zones combined with geo-redundant services now provide equivalent or better protection with far less architectural complexity and cost. The shift to nonpaired regions is therefore not a forced migration, but an opportunity to simplify. Customers can continue using paired regions where business requirements demand it, while selectively modernizing other workloads to take advantage of platform-managed zone resilience.&lt;/P&gt;
&lt;H2&gt;What’s coming up next for resilience in Azure?&lt;/H2&gt;
&lt;P class="lia-align-justify"&gt;Resilience is evolving from static guidance to continuous, workload-aware execution. A multi-region strategy isn’t only about recovery; it’s also a practical hedge against regional capacity constraints (regions have physical limits within a latency boundary, so growth can eventually hit caps).&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/copilot/resiliency-agent" target="_blank" rel="noopener"&gt;Resiliency agent in Azure Copilot (preview)&lt;/A&gt; helps you spot missing resiliency coverage—such as zone alignment gaps or missing backup/DR—and provides automated guidance (including scripts) to remediate issues, configure Azure Backup and Azure Site Recovery, and define recovery drills.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/resiliency/resiliency-overview" target="_blank" rel="noopener"&gt;Resiliency in Azure&lt;/A&gt; brings zone resiliency, high availability, backup, DR, and ransomware protection together into a unified experience within Azure Copilot, enabling teams to set resiliency goals, receive proactive recommendations, and view service‑group insights via Azure portal.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;If you’re looking for service-specific BCDR and replication guidance, use these authoritative starting points:&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/design-area/management-business-continuity-disaster-recovery" target="_blank" rel="noopener"&gt;Cloud Adoption Framework (CAF) – Landing zone design area (BCDR)&lt;/A&gt;: guidance to define platform DR requirements (RTO/RPO), data residency considerations, and operational readiness as part of landing zone design.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/disaster-recovery" target="_blank" rel="noopener"&gt;Azure Well-Architected Framework (WAF) – Disaster recovery strategies&lt;/A&gt;: guidance for structuring, testing, and operating DR plans aligned to recovery targets, with links to companion DR planning resources.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/well-architected/design-guides/regions-availability-zones" target="_blank" rel="noopener"&gt;WAF design guide – Regions &amp;amp; Availability Zones&lt;/A&gt;: how to choose between zone- vs region-based approaches and understand reliability/cost/performance tradeoffs.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/reliability/?product=popular" target="_blank" rel="noopener"&gt;Azure service reliability guides&lt;/A&gt;: service-by-service reliability/replication behavior and customer responsibilities.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/reliability/regions-multi-region-nonpaired" target="_blank" rel="noopener"&gt;Non‑paired multi‑region configurations&lt;/A&gt;: examples of supported multi-region approaches when regions aren’t paired.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;
&lt;DIV class="lia-align-justify"&gt;&lt;A href="https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/" target="_blank" rel="noopener"&gt;Validate feasibility before you design&lt;/A&gt;: confirm service/SKU/zone availability in both regions.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;&lt;STRONG&gt;Next step&lt;/STRONG&gt;: Explore &lt;A href="https://azure.microsoft.com/en-us/solutions/azure-essentials/" target="_blank" rel="noopener"&gt;Azure Essentials&lt;/A&gt; for guidance and tools to build secure, resilient, cost-efficient Azure projects. To see how shared responsibility and Azure Essentials come together in practice, read &lt;A href="https://azure.microsoft.com/en-us/blog/resiliency-in-the-cloud-empowered-by-shared-responsibility-and-azure-essentials/" target="_blank" rel="noopener"&gt;Resiliency in the cloud—empowered by shared responsibility and Azure Essentials&lt;/A&gt; and &lt;A href="https://azure.microsoft.com/en-us/blog/azure-reliability-resiliency-and-recoverability-build-continuity-by-design/" target="_blank" rel="noopener"&gt;How to design reliable, resilient, and recoverable workloads on Azure&lt;/A&gt; on the Microsoft Azure Blog.&lt;/P&gt;
&lt;P class="lia-align-justify"&gt;For expert-led, outcome-based engagements to strengthen resiliency and operational readiness, &lt;A href="https://www.microsoft.com/en-us/microsoft-unified/plan-details" target="_blank" rel="noopener"&gt;Microsoft Unified&lt;/A&gt; provides end-to-end support across the Microsoft cloud. To move from guidance to execution, start your project with experts and investments through &lt;A href="https://azure.microsoft.com/en-us/solutions/azure-accelerate/" target="_blank" rel="noopener"&gt;Azure Accelerate&lt;/A&gt;.&lt;/P&gt;
&lt;H2&gt;Related Resources&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/well-architected/design-guides/regions-availability-zones" target="_blank" rel="noopener"&gt;Architecture strategies for using Availability Zones and Region&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;High Availability&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/well-architected/reliability/highly-available-multi-region-design" target="_blank" rel="noopener"&gt;Architecture strategies for highly available multi-region design&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/redundancy" target="_blank" rel="noopener"&gt;Disaster Recovery&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/well-architected/reliability/disaster-recovery" target="_blank" rel="noopener"&gt;Architecture strategies for designing a Disaster Recovery strategy&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/reliability/regions-multi-region-nonpaired" target="_blank" rel="noopener"&gt;Multi-Region solutions in nonpaired Regions&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/well-architected/design-guides/disaster-recovery" target="_blank" rel="noopener"&gt;Develop a disaster recovery plan for multi-region deployments&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;Azure Regions and Services&lt;/P&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/reliability/regions-paired" target="_blank" rel="noopener"&gt;Azure region pairs and nonpaired regions&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL class="lia-align-justify"&gt;
&lt;LI&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/reliability/overview-reliability-guidance" target="_blank" rel="noopener"&gt;Reliability guides for Azure services&lt;/A&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-align-justify"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Apr 2026 20:48:58 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/reliability-and-resiliency-in/modern-azure-resilience-with-mark-russinovich/ba-p/4508967</guid>
      <dc:creator>molina_sharma</dc:creator>
      <dc:date>2026-04-28T20:48:58Z</dc:date>
    </item>
  </channel>
</rss>

