SRE - Using Error Budgets to Prioritize Work

Site Reliability Engineering (SRE) is a set of principles, practices, and organizational constructs that seek to balance the reliability of a service with the need to continually deliver new features. An error budget is the primary construct used to help balance these seemingly competing goals.

This is an introduction to error budgets and their components: service level indicators (SLIs) and service level objectives (SLOs). We will discuss the art of creating and implementing SLOs.

Attendees will be able to:

  • Describe the key concepts, namely, Error Budget, Service Level Indicator (SLIs), and Service Level Objectives (SLOs)
  • Recommend actions to take when the error budget is over consumed
  • Recommend actions to take when excess error budget remains

In the spirit of DevOps, Error Budgets and SLOs work best when they are agreed to in collaboration with many different constituents across the business. As such, this presentation is appropriate for:

  • Product Owners and Product Managers
  • Business decision makers
  • Developers
  • Operators
  • And anyone else interested in building and operating services that deliver business and customer value.

Slideshare

Speaker

nathen-harvey

Nathen Harvey

   
Nathen Harvey, Cloud Developer Advocate at Google, helps the community understand and apply DevOps and SRE practices in the cloud. Nathen is part of the DevOps Days conferences global organizing committee.