How to Have an Operational Incident (a crash course)

What happens at your company when a service goes down? Hopefully an alarm fires somewhere and someone gets paged, but then what? Does the person who got paged fix it all themselves (and do they feel as isolated as that sounds)? What if they don’t know how- is there a procedure for them to get help? Do you have a protocol for deciding when the incident is over?

More and more, most of us work at companies that provide a service. Even if you’re a game dev or you work at a retailer, the way you interface with your customers is a web service, and services have outages. Let’s talk about the basics of incident response- what it is, how it helps, how to learn more. I may not be able to fix all your problems in a 30m talk, but I can help get you going in the right direction!

Graphic Recording How to Have an Operational Incident (a crash course)



Courtney Eckhardt

Courtney comes from a background in customer support and internet anti-abuse policy. She combines this human-focused experience with the principle of Conway’s Law and the work of Kathy Sierra and Don ...