At Microsoft, we take resilience seriously. We understand that the consequences of unavailability are severe – your projects, applications, and even businesses depend on the Azure Cloud to be highly available and resilient to failure. If users can’t access your services, they are likely to get upset, but more so there can be financial, legal, and even life-or-death consequences when your application is down.
That’s why today we are excited to announce the public preview of Azure Chaos Studio, a new service for improving your applications’ resilience to disruptions. With Chaos Studio you can practice chaos engineering: a method of experimenting with controlled fault injection against your applications to help you measure, understand, and improve resilience against real-world incidents, such as region outages or memory leaks in an application. In this post, we’ll briefly share why practicing chaos engineering is important, give an overview of Chaos Studio, and share a little bit about our roadmap. When you’re ready to get started, visit our documentation or try it out for yourself in the Azure portal.
The move to cloud services introduces new challenges to building reliable applications. Layers of abstraction from physical infrastructure provide relief from needing to manage failures, but can also introduce new dependencies that are a “black box” when things go wrong. Cloud-native architectures simplify deployment and management, but teams may lack the confidence that they can remain resilient to failure using them. Much like Security, Resilience requires constant attention from both the cloud provider and the cloud consumer.
Whether you are developing a new application that will be hosted on Azure, migrating an existing application to Azure, or operating an application that already runs on Azure, it is important to improve your application's ability to handle and recover from disruptions that can negatively impact your customers experience and erode their trust in your business or mission. To avoid these negative consequences, you need to validate that your application responds effectively to disruptions that could be caused by a service you depend on, disruptions caused by a failure in the service itself, or even disruptions to incident response tooling and processes. Chaos experimentation enables you to test that your application is resilient to these failures at any phase in the service lifecycle – from development through to production.
Chaos engineering can be used for a wide variety of scenarios, including post-incident analysis, "game days,” BCDR drills, and validation of live site tooling and on-call processes. For many of these scenarios, you first build resilience using ad-hoc chaos experimentation then continuously validate that new deployments won't regress resilience by adding chaos testing as a deployment gate in your CI/CD pipeline.
Designing an experiment in Chaos Studio
Chaos Studio enables you to orchestrate fault injection on your Azure resources in a safe and controlled way. A few of the key benefits of Chaos Studio are:
Chaos Studio is free to use through April 4, 2022, and thereafter usage will be charged pay-as-you-go by the target action-minute .
Chaos Studio is already being used by Azure customers that span industries including retail, finance, healthcare and emergency services, and it is being used across Microsoft to improve quality as well. Over 50 teams across Microsoft are running chaos experiments with Chaos Studio, including the Power Platform team and the Azure Key Vault team. Check out this video to hear how Chaos Studio has helped them to identify opportunities for improved resilience:
Today we begin our public preview for Chaos Studio and we’re excited to hear your feedback on what faults and features you’d like to see next. If you’re ready to systematically improve resilience with controlled chaos, get started by visit our documentation or get started in the Azure portal. Let the chaos begin!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.