ETCD debugging saga

Imagine you have set up a Kubernetes cluster on premise. It works fine for a few months. And then, with no reason, master nodes got high load and you are not able to fix it. There is nothing obvious in the logs, or at least they are inconclusive. After a few hours it fixes on its own. The situation repeats after a few weeks. This is a story of a difficult investigation, with no witnesses, circumstantial evidence and lots of suspects.



Przemyslaw Koltermann


Docker Certified Associate and Docker Community Leader in Warsaw. Senior Software Engineer at Team DevOps, active everywhere where developers and IT meet, like development