Underneath all products, whether your code or a service that you connect to, there will be a disk, a network, a CPU, or a Memory that will fail. The talk considers a sample and straightforward product and evaluates the depths of each failure point, impact, cost, and changes need to overcome these.
Every product either dies a hero or lives long enough to hit Reliability issues. While you go about fixing this, What is the cost, both in terms of effort and business lost, of failure and how much does each nine of reliability cost?
The talk considers a sample and straightforward product and evaluates the depths of each failure point. We take one fault at a time and introduce incremental changes to the architecture, the product, and the support structure like monitoring and logging to detect and overcome those failures.