Software/Site Reliability of Distributed Systems




Underneath all products, whether your code or a service that you connect to, there will be a disk, a network, a CPU, or a Memory that will fail. The talk considers a sample and straightforward product and evaluates the depths of each failure point, impact, cost, and changes need to overcome these.

Every product either dies a hero or lives long enough to hit Reliability issues. While you go about fixing this, What is the cost, both in terms of effort and business lost, of failure and how much does each nine of reliability cost?

The talk considers a sample and straightforward product and evaluates the depths of each failure point. We take one fault at a time and introduce incremental changes to the architecture, the product, and the support structure like monitoring and logging to detect and overcome those failures.

Speaker

piyush-verma

Piyush Verma

 
Piyush Verma is an Infrastructure engineer. In his past life he built oogway.in and siminars.com