Have you been blamed for an outage? Was the root cause ““Human Error?”“ Running software services is hard. We all work with complex systems that fail in different ways. How can we reduce the impact of future failures? By learning from other industries we can improve the services we run.
Attendees will learn the following: * Why we experience failures * Why Five whys and RCA methods should not be used * Etsy’s Open Source Debriefing Method * How to facilitate learning sessions * How to improve from failures * How not to jump into ““fix it now”” mode
This talk will be useful for any teams that experience failures - whether they are large or small, distributed, or cross-functional. Attendees will walk away with a plan for addressing future failures and the foundation to keep improving their post-failure responses.