Last year the Slack development team and operations teams were living in different worlds. Development teams deployed to production over a hundred times a day, and a centralized operations team tried to fix things when they broke. The operations teams struggled to support systems they had not written. Heros and knowledge islands saved the day over and over. Post-incident meetings were poorly attended and did not encourage learning.
Slowly, then quickly, all that changed. Slack moved to teams of empowered developers on-call, with embedded SREs, safer production deployments, and actionable alerts. Post-incident meetings focus on learning, and meaningful analysis of incident patterns is done at all levels of the company.
In this talk you’ll hear all about the bumps and scrapes, triumphs and pitfalls of our journey from a centralized ops team to development teams that own the full lifecycle of their systems. It wasn’t easy, but it wasn’t impossible. Hopefully it will inspire you to try something radically different at your company too.