Many delivery decisions in software organizations are still driven by opinion, intuition, or perceived risk. Practices are introduced because they “should work”, because experts recommend them, or because they look safe on paper. Yet teams often struggle to understand whether these decisions actually improved delivery — or simply changed the system in unintended ways. The core problem is that software delivery is a complex, adaptive system. Its behavior varies over time, and interventions that help in one context may fail in another — or stop working as the system evolves. Without a way to reason about variation, teams often react to noise, optimize the wrong things, or turn metrics into targets. In this talk, I share a real-world experience of shifting from opinion-based decisions to treating delivery changes as explicit hypotheses about system behavior. By combining DORA metrics with simple variation analysis using Process Behavior Charts, we learned how to distinguish normal process fluctuation from meaningful change and understand which interventions actually improved flow and reliability. The talk focuses on how delivery metrics can be used as a feedback loop for learning, not as performance KPIs. Through concrete examples from multi-team environments, I show how teams used evidence to reason about trade-offs, avoid reactive decision-making, and improve delivery without gaming the numbers or burning out people. This is an experience report about what worked, what failed, and what surprised us when DORA metrics met real systems and real human behavior.

Egor Savochkin is an engineering leader based in Amsterdam, working at Booking.com. He has over 15 years of experience improving software delivery in large-scale, high-stakes systems across fintech
...