Responding to and learning from failure

Community Manager
Tailwind Traders has done a tremendous amount of good work using modern operations principles and practices to create, deploy, monitor, and troubleshoot their applications and infrastructure in the cloud. As an initial effort, this has been superb, but the engineers know that putting processes in place for continuous learning and continuous improvement are the only sure way to provide continuous value to the customers. In this session, we'll do more than just talk about these processes, we'll see how they work in action. We pick up the story right in the middle of Tailwind Traders first significant outage. Everything is on fire (metaphorically) and the engineers are struggling to understand the problem and remediate it as fast as possible. We'll demonstrate not just how the outage is brought under control, but even more importantly, how Tailwind Traders is able to learn from their experience after the fact and improve their systems while doing so. Understanding this process is one of the most important keys to continuous improvement, "leveling up" our operational practices, and getting the most value from our cloud investments.

Attend this session in a city near you

Register here
Session Resources