Better On-Call the SRE Way?

SRE at Google uses an incident management protocol to manage outages of the complex distributed systems we manage. These are stressful times that involve a fundamental human factor, when many people can be involved and that the regular organisation might not help unlocking the whole potential of the persons involved.

From psycological safety, to training and protocols, this talk is meant to discuss everything that goes on when services are not available.



Ramón Medrano Llamas

Ramón Medrano Llamas is a site reliability engineering manager at Google, focused on the Identity and User teams. He concentrates on the reliability aspects of new Google products and new features of ...