As a manager of a team that has On-Call responsibilities, I am personally accountable for ensuring engineers are prepared to receive pages. As a systems thinker, I’ve broken this down into many different layered approaches that leave room for adaptation in the face of new and emergent system behavior.
Ok, great… what does that mean? Troubleshooting problems within a complex system requires enormous amounts of context-specific expertise. It is next to impossible to provide adequate training to build that expertise. That expertise is created over time and I’ve found that learning on the job is the most effective way to build it.
Treat On-Call as a team sport; give an engineer the freedom to learn and grow. If it’s a team sport you can still get the best possible response out of an incident, learning along the way. How you run your On-Call on-boarding has a significant impact on creating this team sport mentality and affects team culture. If we build a team culture based on trust and learning, we can do so much better than “Good Luck Have Fun”.