On call is an unfortunate side effect of the technology industry. Outages happen outside business hours, frequently when engineers are sleeping. Using existing tools, and reexamining alerting healthchecks, it is possible to make on call a much happier experience.
In this talk, we will examine how Yelp reduced pager noise, and improved on call engineer happiness, while maintaining stability on their platform.
Paul O’Connor works in the operations team on infrastructure automation and monitoring at large scale for Yelp. He lives in Dublin, is an avid open source contributor, and occasional musician, and a fan of craft beer.