Title: Three years of breaking things to make them better
Description:
For three years PagerDuty has run "Failure Friday", a weekly exercise that uses simple failures like killing a process or adding network latency to expose problems in our systems and alerting. This talk will share what we've learned in that time: how our fault injection techniques have changed, the best way to get started injecting failures in your own environment, and how you can use it to improve your software as well as your people.