Abstract:
As organizations experiment with greater concurrency and integration between their departments and move toward a continuous delivery of customer-value, failure is assured. Asking "how can failure be avoided?" isn't as useful or relevant as focusing on
This is the question that presented itself to Salesforce’s Service Reliability Engineering team. Their SREs had received training in incident response and management, but were still struggling with how to incorporate that feedback into the organization at large, to improve outcomes. Feedback loops weren’t always closed, leaving many opportunities for improvement lost.
This is the story of my months-long journey with J.Paul Reed and my team to identify the specifics of what made reliability retrospectives difficult to have, why actionable takeaways were often lacking, and how the feedback loops within the company’s operations organization weren’t serving Salesforce’s needs.
We then ran a series of experiments together, putting the SRE team on a road to improving their ability to respond, react, remediate, and re-incorporate learnings from failure into the organization.
The Takeaways?
Speakers:
Kevina Finn-Braun’s focus throughout her 18 years in the Internet Industry has been Operational Excellence and Risk Management. She is currently Director of Site Reliability Service Management at Salesforce where she leads the team focused on operational process improvements in the areas of incident, problem and change management. In her previous role as Director of Business Continuity at Yahoo! she led the team focused on risk management and service continuity best practices.
J. Paul Reed, aka The Sober Build Engineer, has over a decade of experience in the trenches as a build/release and tools engineer, working with such organizations as VMware, Mozilla, Postbox, and Symantec. In 2012, he founded Release Engineering Approaches, a consultancy incorporating a host of tools and techniques to help organizations “Simply Ship. Every time.” He’s worked across a number of industries, from financial services to cloud-based infrastructure, with teams from 2 to 12,000 on everything from tooling, operational analysis and improvement, team culture transformation, and business value optimization.