Many of us have experienced at least one big outage of services we’re responsible for, and it’s never a fun experience. With customers, executives and other teams all yelling at the same time, it’s easy to freak out and not know where to begin.
Many industries have discovered that if you have nothing else, having a plan and working together will allow cool heads to prevail and reliably follow the steps needed to get back up and running quickly.
This talk will provide you with tales of practices that can help your organization handle incidents effectively, and get more of your people involved to reduce the burden and bottlenecks.
Tom is an engineer at Yext, leading the Production Engineering group, responsible for SRE and developer tooling, among other things. A full-time Go developer, he is also known to work on iOS apps in...