resiliency
6 TopicsAnnouncing Azure Infrastructure Resiliency Manager Public Preview
At Microsoft Build 2026, we are thrilled to announce that Azure Infrastructure Resiliency Manager is now available in public preview, open to all Azure customers. Azure Infrastructure Resiliency Manager is not a replacement for individual Azure resiliency features; it is the unifying layer that connects them into a coherent, goal-driven workflow. It leverages and complements Availability Zones, Azure Advisor, Azure Chaos Studio, Azure Monitor, and Azure Copilot, adding purposeful orchestration that turns isolated capabilities into a complete resiliency strategy. The preview already covers a broad range of Azure resource types and zone-redundant configurations, from virtual machines and databases to AKS clusters and networking with continued expansion planned. The new platform is built on a foundational belief: achieving application resilience is a continuous journey, not a one-time configuration task. That journey is organized into three actionable phases: Start Resilient, Get Resilient, and Stay Resilient. Each phase delivers measurable customer value such as reduced downtime risk, faster recovery, and greater operational confidence. Start resilient: Embedding resiliency from day one Starting resilient means treating resiliency as a fundamental architectural requirement, not an afterthought. Azure Infrastructure Resiliency Manager makes it straightforward to design zone-resilient applications from the outset, eliminating costly retrofits and reducing risk before your first deployment. Resiliency Agent: Your AI-powered architecture advisor The standout capability in this preview is the Resiliency Agent, a conversational, AI-powered assistant embedded directly in the Azure Portal. Designed for architects and developers, the Resiliency Agent allows teams to validate and refine resiliency strategies using plain language. For example, you might enter a prompt such as "I'm designing a three-tier web app with VMs, a Flexible PostgreSQL database, and a Standard Load Balancer" and ask the agent what zone-resiliency requirements apply. The Resiliency Agent analyzes your plan, identifies single points of failure, and recommends specific changes: enabling zone redundancy for the database, deploying VMs across zones, or upgrading to zone-redundant load balancers. It delivers a structured, per-resource summary that makes the path to resiliency explicit and actionable. Infrastructure-as-Code generation and validation Beyond design guidance, Infrastructure Resiliency Manager accelerates implementation. You can ask the Resiliency Agent to generate Infrastructure-as-Code (IaC) templates (ARM, Bicep, or Terraform) with all resiliency configurations pre-built and ready to deploy. A generated Bicep template, for example, automatically includes zone-redundant settings for databases, VMs, and load balancers aligned to your stated goals. The agent also validates existing IaC templates: upload a template and receive a natural language assessment of resiliency gaps, complete with targeted suggestions and code snippets to close them. This eliminates manual review overhead and ensures every new deployment starts with a resilient foundation by embedding resiliency into the design and deployment lifecycle from day one, organizations avoid expensive redesigns, accelerate time-to-market, and bring new services to production already meeting high-availability standards. Get resilient: Closing gaps in existing applications Most Azure customers have workloads built over months or years that may not fully meet today's resiliency requirements. Infrastructure Resiliency Manager delivers a centralized, goal-driven view of your current environment's resilience posture, along with prioritized, actionable recommendations to close every gap. Goal-driven resiliency posture Define what constitutes your application by grouping resources across regions, subscriptions, or resource groups, including tag-based grouping, using Service Groups. Once your application boundary is established, assign a resiliency goal: for example, zone-failure tolerance for all components, or specific data replication requirements for critical services. The platform assesses every resource against that goal and presents a clear, single-pane-of-glass resiliency posture showing which resources meet the goal, which are non-resilient, and which remain unevaluated. This goal-driven model ensures that all subsequent guidance is precisely calibrated to your target state, not generic best practices. Actionable, prioritized recommendations For every resource that falls short of the defined goal, Infrastructure Resiliency Manager generates targeted remediation recommendations powered by Azure Advisor. If a virtual machine lacks zone redundancy, the platform recommends converting it to an availability zone deployment. If a database is not zone-redundant, the recommendation specifies exactly how to enable it. Critically, every recommendation includes contextual decision-making information: impacted resources, implementation steps, and qualitative cost indicators (High, Medium, Low) that flag whether a fix requires additional service spend, downtime, or redeployment. This allows engineering teams to plan remediation in a business-informed, prioritized manner. Looking ahead, the platform will also integrate application health with infrastructure health, correlating Azure Monitor SLIs and Azure Health Model insights to surface resiliency gaps with even greater precision. Guided remediation with the resiliency agent Azure Advisor identifies resiliency gaps and surfaces prioritized recommendations. Infrastructure Resiliency Manager builds on this by making those recommendations actionable. Instead of stopping at insights, the platform provides guided execution. Each recommendation includes step-by-step portal flows, dependencies, and readiness checks required for remediation. The Resiliency Agent acts as the interactive layer on top, helping you interpret and act on these recommendations in context. For example, you can ask whether an App Service can be moved to zone-redundant storage, what downtime to expect, or what prerequisites are required and receive clear, workload-aware answers tailored to their environment. On request, the agent can generate remediation scripts or IaC snippets to implement specific changes, such as validating an existing Terraform template against Azure resiliency best practices. Importantly, the agent never makes changes autonomously: it provides information and code, while you retain full control over execution. This human-in-the-loop model accelerates remediation without sacrificing governance. The result: a curated, goal-oriented to-do list that replaces generic advice with targeted action, weighted by cost and feasibility - giving engineering leaders clear visibility into which investments will yield the greatest resilience gains. Stay resilient: Continuous validation and recovery Readiness Resilience is not just a configuration milestone; it is an ongoing operational discipline. The "Stay Resilient" phase ensures the resilience you've built performs under pressure and that your teams are prepared to respond when real incidents occur. Azure Infrastructure Resiliency Manager delivers resiliency drills and recovery orchestration to support continuous readiness. Resiliency drills enabled by Azure Chaos Studio A highlight of this public preview is the introduction of availability zone failure drills, enabled by Azure Chaos Studio. These drills simulate zone outages for your application in a controlled, safe environment: shutting down VMs in a target availability zone, forcing failover for zone-redundant databases, or stopping AKS node pools. Every fault action is based on Azure-recommended patterns for each supported resource type, providing a realistic approximation of an actual zone failure. Because Infrastructure Resiliency Manager understands which resources are intended to be zone-resilient, it automatically determines which fault actions to apply, eliminating manual configuration. For scenarios not covered out of the box, custom fault logic via Azure Automation runbooks is supported, providing the flexibility required for complex environments. Recovery orchestration Resiliency drills in the platform go beyond fault injection. It integrates with recovery plan to orchestrate the complete recovery sequence automatically after injecting faults: fault injection → failover → reprotection → failback. This full-cycle simulation measures the maximum potential downtime your application could experience during a zone outage and surfaces any recovery steps that did not execute as expected. Real-time health monitoring and drill insights Throughout each drill, the Infrastructure Resiliency Manager provides live health monitoring powered by Azure Monitor. A built-in metrics dashboard tracks each resource's health in real time revealing whether your application remains available and how performance holds under simulated stress. This immediate feedback surfaces resilience gaps that may not have been visible through static analysis. After each drill, the platform logs the results along with team notes and attestations, building a historical record of all resilience tests. Over time, this record demonstrates measurable improvement and supports compliance with organizational and regulatory resiliency requirements. "Stay Resilient" converts assumptions into evidence. When an actual zone outage occurs, your teams will not be executing a failover for the first time; they would have rehearsed it. The result is a culture of proactive resilience, and the organizational confidence that your systems will deliver on their availability commitments. Get started with the public preview Starting today, the public preview of Azure Infrastructure Resiliency Manager is open to all Azure customers. Access the new platform through the Azure Portal by searching for "Resiliency". We encourage you to evaluate it against a test application or a production workload to gain immediate visibility into your current resiliency posture. To get the most from Infrastructure Resiliency Manager, we recommend these three starting actions: Define a resiliency goal for a critical application and review the posture insights the platform surfaces; you may uncover gaps that were previously invisible. Engage the Resiliency Agent to tackle a few recommendations and experience firsthand how AI-guided remediation accelerates your team's workflow. Run a zone-down drill in a non-production environment to validate your failover and recovery processes under realistic conditions. We believe this holistic approach will help organizations achieve a new level of operational excellence, making resiliency actionable, measurable, and deeply embedded in cloud practices. As Infrastructure Resiliency Manager moves toward general availability, we will continue incorporating your feedback and expanding capabilities to meet the demands of real-world cloud architectures. Azure Infrastructure Resiliency Manager gives you the tools to reduce downtime risk, gain clarity over your resiliency posture, and build genuine readiness for the unexpected. Join the public preview today and take the next step toward applications that don't just survive disruptions; they thrive through them. Resources Azure Infrastructure Resiliency Manager — Overview Get Started with Service Groups — Microsoft Learn Introduction to Azure Advisor — Microsoft Learn What is Azure Chaos Studio? — Microsoft Learn What's New in Azure Monitor — Microsoft Learn Modern Azure Resilience with Mark Russinovich — Tech CommunityTech Accelerator: Mastering Azure and AI adoption
Join us to learn about the essential guidance, resources, products and tooling you need to accelerate your next Azure and AI project or enhance your existing Azure deployments. Get in-depth technical guidance from Microsoft experts to enhance the reliability, security and ongoing performance of your Azure workloads. Learn more about AMD products and solutions to accelerate cloud adoption. Now on demand! Best practices for secure and reliable Azure projects Govern, manage and secure your AI deployments How to run a successful Azure migration project Advance cloud infrastructure: Essentials with AMD on Azure Essentials to build and modernize AI applications on Azure Proactively design, deploy & monitor resilient Azure workloads Cloud platform security in an evolving threat landscape6.4KViews7likes4CommentsOn-Demand Backups in Azure Database for PostgreSQL – Flexible Server Now Generally Available
We’re excited to announce the General Availability of On-Demand Backups in Azure Database for PostgreSQL – Flexible Server! In today’s dynamic data management landscape, ensuring the protection and recoverability of your databases is essential. Azure Database for PostgreSQL – Flexible Server streamlines this responsibility through comprehensive backup management, including automated, scheduled storage volume snapshots encompassing the entire database instance and all associated transaction logs. With the introduction of On-demand backups you now have the flexibility to initiate backups at any time, supplementing the existing scheduled backups. This capability is particularly valuable in scenarios involving high-risk operations, such as system upgrades or schema modifications, or when performing periodic data refreshes that do not align with the standard backup schedule. Benefits Instant Backup Creation: Trigger a full backup of your server on demand—no more waiting for the automated backup schedule. Cost Optimization: While Azure manages automated backups that cannot be deleted until the retention window is met, on-demand backups provide greater control over storage costs. Delete these backups once their purpose is served to avoid unnecessary storage expense. Enhanced Control & Safety: Take backups before schema changes, major deployments, or periodic refresh activities to meet your business requirements. Seamless Integration: Accessible via Azure Portal, Azure CLI, ARM templates, and REST APIs. Azure Database for PostgreSQL Flexible Server provides a comprehensive, user-friendly backup solution, giving you the confidence to manage your data effectively and securely. Let us explore how on-demand backups can elevate your database management strategy and provide peace of mind during high-stakes operations. Automated Backups vs On-Demand Backups Feature Automated Backups On-Demand Backups Creation Scheduled by Azure Manually initiated by the user Retention Based on the backup policy Based on the backup policy Deletion Managed by Azure User-controlled Use Cases Regular data protection High-risk operations, ad-hoc needs How to take On-Demand Backups using the portal. In the Azure portal, choose your Azure Database for PostgreSQL flexible server. Click Settings from the left panel and choose Backup and Restore. Click Backup and provide your backup name. Click Backup. A notification is shown that an On-demand backup trigger has been initiated. For more information: How to perform On-demand backups using Portal How to take On-Demand Backups using CLI. You can run the following command to perform an on-demand backup of a server. az postgres flexible-server backup create --resource-group <resource_group> --name <server> --backup-name <backup> Example: For more information: How to perform On-demand backups using CLI How to list all on-demand backups using CLI You can list currently available on-demand backups of a server via the az postgres flexible-server backup list command. az postgres flexible-server backup list --resource-group <resource_group> --name <server> --query "[?backupType=='Customer On-Demand']" --output table For more information: How to list all backups using Portal What's Next Once you have taken an on-demand backup based on your business needs, you can retain it until your high-risk operation is complete or use it to refresh your reporting or non-production environments. You can delete the backups to optimize storage costs when the backup is no longer needed. To restore or delete on-demand backups, you can use the Azure portal, CLI, or API for seamless management. Limitations & Considerations: SKU Support: On-demand backups are available for General Purpose and Memory-Optimized SKUs. Burstable SKUs are not supported. Storage Tier Compatibility: Currently, only the SSDv1 storage tier is supported. Support for SSDv2 is on our roadmap and will be introduced in a future update. You can take up to 7 on-demand backups per flexible server. This limit is intentional to help manage backup costs, as on-demand backups are meant for occasional use. The managed service already provides support for up to 35 backups in total, excluding on-demand backups. Take Control of Your Database Protection Today! The ability to create on-demand backups is critical for managing and safeguarding your data. Whether you're preparing for high-risk operations or refreshing non-production environments, this feature puts flexibility and control in your hands. Get started now: Create your first on-demand backup using the Azure Portal or CLI. Optimize your storage costs by deleting backups when no longer needed. Restore with ease to keep your database resilient and ready for any challenge. Protect your data effectively and ensure your database is always prepared for the unexpected. Learn more about Azure Database for PostgreSQL Flexible Server and explore the possibilities with on-demand backups today! You can always find the latest features added to Flexible server in this release notes page. We are eager to hear all the great scenarios this new feature helps you optimize, and look forward to receiving your feedback at https://aka.ms/PGfeedback.675Views1like0CommentsHow to Check Database Availability from the Application Tier
First published on MSDN on Jan 29, 2017 Reviewed by: Mike Weiner, Murshed ZamanA fundamental part of ensuring application resiliency to failures is being able to tell if the application database(s) are available at any given point in time.4.2KViews1like0Comments