Sustainable Incident Management

How you respond to production outages can affect both team morale and development velocity. With the proper Incident Response processes in place, it can reduce this stress, and make it easier to ramp up new teammates, and the focus on new features. This talk will look at Incident Management at its core, covering Incident Command and how to scale it with a growing organization. We’ll go over common areas of pain for Incident Responders and how to ease them to reduce friction between Product and SRE teams; such as best practices for playbooks, on-call rotations, error budgets, postmortems and incident communication to streamline incident resolution.

Speaker

Ajuna Kyaruzi

 
Ajuna Kyaruzi works in Developer Relations at Datadog was born and raised in Dar es Salaam, Tanzania. She loves community building and volunteers with multiple mentorship programs aimed helping early ...