Creating Reliability Through Chaos With Azure VMs and Gremlin
The idea of “Chaos Engineering” isn’t just about putting faith in a provider to stay online, it’s finding ways to simulate failure in order to determine that you’ll withstand an outage of any kind within your application. This means that if a number of your app servers take on a large portion of traffic and are highly CPU taxes, you’ll know how to properly scale your application to withstand it. If portions of your application infrastructure were to take on a massive amount of packet loss, how does your team respond?
Chaos engineering helps answer some of these questions by allowing you to simulate the possibilities of what a failure may look like in your production environment. For some, using tools likeChaos Monkeyhas helps produce load and service failures to help create attack simulations. Lately I have been working with Gremlin, which acts as a “Chaos-as-a-Service” through a simple client-server model.