Failure Testing prepares us, both socially and technically, for how our systems will behave in the face of failure. By proactively testing, we can find and fix problems before they become crises. At Netflix and Amazon we ran regular Game Days where we would test our system for failures in order to find problems, save us from future incidents, and train our teams how to handle the unexpected.
This talk will walk through the process of how your team can run an effective “Game Day” and safely test your system for weak points, identify opportunities to bake in more resilience, and ensure your team is well-trained. Then you can sleep peacefully knowing you are ready!