DashboardOps: Understanding how we diagnose incidents to create better dashboards

The monitoring systems at our companies are one of the widest windows into how our services are behaving. They are where we go when things are going wrong. They are also how we communicate with colleagues and our future selves about how our systems are composed. These systems also contain archeological information about past events as we tweak them over time.

However, dashboards can sometimes be an afterthought. They can be left as a task for later, a low priority item that never gets finished. It shouldn’t be so!

We’ll explore some of the methods that humans use to investigate outages and incidents. With that knowledge in hand, we’ll talk through some techniques you can use in your company to:

  • improve your dashboards to reduce incident response times

  • learn more about your company’s services through existing dashboards

  • teach others as you go!



Wyatt Walter

Wyatt Walter works is a senior engineer working at Ad Hoc. He has spent over a decade floating between Dev and Ops. He loves monitoring, automating things, and the tv show King of the Hill.