Technology ecosystems are complex and it is really important to understand every change and how it affects our systems, as well as the service provided. Users expect systems to be up, responsive, fast, consistent, and reliable.

Reliability for systems means that they are doing what their users need them to do. A system’s reliability is essentially how happy users are and we know those happy users are good for business. If reliability is one of the most important requirements of any system, users determine what reliability means, and it’s okay to not be perfect all the time. We need a way of thinking that can address this way of thinking since we have limited resources to spend, be they financial, human, or political.



Ricardo Castro

Lead Site Reliability Engineer at Anova. MSc in Computer Science by the University of Porto. CK{AD, A, S} by Cloud Native Computing Foundation (CNCF) | Linux Foundation. {Terraform, Consul, Vault} ...