Kubernetes promises high availability and resilience, but how does it recover from failures?
In this hands-on session, you will test Kubernetes’ fault tolerance by building a cluster from scratch and then methodically breaking it to observe its behaviour. You will gain deep insights into core components, including the control plane and the event-driven architecture that powers Kubernetes.
DevOps engineers, SREs, platform engineers, and Kubernetes administrators who want to move beyond theoretical knowledge and gain deeper insights into Kubernetes’ resilience features.
You’ll examine how kubelet maintains the desired state and verify the “no single point of failure” design principle. By taking down the cluster one node at a time, you will witness Kubernetes’ recovery mechanisms and develop a practical understanding of designing robust, production-ready deployments.
This session is ideal for DevOps practitioners, platform engineers, and anyone looking to strengthen their Kubernetes troubleshooting skills beyond theoretical knowledge.
Benefits for Operations Teams, responsible for managing infrastructure, ensuring reliability, and optimizing performance (key concerns in DevOps practices), By the end of the session, you will: