Observability at scale

Observability, the trade of observing systems and making sense out of the operational jungle. With the focus on reliability of platforms, everybody is rushing into observability, but are they attacking the subject from the right angle?

Many blog posts and conference talks always highlight the fact that it is not about the tools but what does that mean really in day to day.

How do companies implement it?

What does an observability team does?

Do we need an observability team?

This is a conceptual talk, it is not a technical one, I will not be diving into implementation details, although I will be giving examples of how we did things at my company.

This talk is targeting anyone who is in an engineering role, including manages and project managers. I will explain the mission of the observability team and describe the process of creating it, going through the different stages and evolution of how we ended with our mission statement.

I will show that tooling is not at the heart of our mission.

I will then explain how that translates into our 9 month planning and day-to-day. The key takeaway is how to implement observability without a focus on tools, and some tips on how to cultivate an observability culture.

The message is clearly that “Observability is a culture not a toolset” and I will show concrete example of that. In this talk I will give real life example of an observability team focused not on the tools but rather on really making things observable.



Ramez Hanna

Long time operations engineer, turned manager (hopefully for the right reasons). Leading SRE observability team at criteo.