Service Level Objectives (SLOs) are a simple and useful way to define and measure reliability. They may not seem very powerful at first, but behind the simplicity lies a lot of potential if you decide to explore it. The Google Photos team has been using SLOs extensively over the past few years and they have become the foundation for discussing reliability across the SRE, devs, and PMs. We started with a few basic principles and the rest has been a process of discovering how to best use them in practice.
In this talk we will go over some of the lessons learned by Photos SRE when applying SLOs. We’ll present the various cases where they can add value, how to use them most effectively, and the unexpected ways in which they have helped us so far.
Piotr is a Site Reliability Engineer at Google. He works on the Google Photos team, where he focuses on monitoring and alerting. Originally from Poznań, currently lives in Mountain View, California.