Responsible Service Ownership

How do you ensure that your newly deserved service is stable, reliable, and maintainable, prior to rolling to production? What about services that are already in production? Are there ways you and your team could prevent major outages in advance of critical outages and late-night pagers? Well-defined and calmly enforced standards can prevent and fix both major and minor outages. The enforcement of preventative measures go back through numerous concepts and industries, preventing countless major incidents. Defensive measures translate into saving of lives, resources, and money. This is no different for the tech industry, and this is often the bread and butter of engineers in the DevOps space. This talk will review several standard patterns that we use at Twitter to define and manage standards. Pre-production standards ensure stability, visibility, and manageability. Post-production standards revise already-productionized services. You too can come up with a method for implementing standards that prevent and mitigate countless outages, fixing issues before they happen.

Speaker

Brian Weber

Hi! My name is Brian, and I’m an SRE at Twitter. My pronouns are he/him. I support the direct messaging product and our platform security team, where I work on improving the stability and security of

...