Title: The story of a performance outage, and how we could have prevented it
Description:
Dealing outages in production environments can be fraught with stress & tension, especially when your business users or customers are impacted.
Regardless of how fast you can release features into your production environments, unfortunately we caeterlikely that you will have to deal with a business impacting outage at some point. Having resilient mechanisms, and procedures in place means that 90% of these potential issues can be caught before they cause you true headaches, but in a true worst case scenario it's important to be able to quickly diagnose issues, and restore services.
Let me take you through a business impacting performance outage and share with you what happened, how we found it the root cause and how we could have found it earlier.