How we Went From Two Major Outages to 99.98% Reliability in Just 6 Months

Our security startup initially experienced strong market traction, but a series of significant outages a few years in threatened to undermine our progress. To address these issues, we made the difficult decision to pause all product engineering for an entire quarter and focus exclusively on improving system reliability. This strategic shift allowed us to achieve 99.98% reliability in just six months, and we now consistently maintain 99.99% reliability.

In this talk, Eran will detail the specific backend modifications we made to minimize the impact of outages, including the development of robust backup systems to prevent recurrence. He’ll also cover the practical lessons learned from managing critical incidents and the approaches we used to communicate effectively with customers post-outage to restore trust.

This presentation is aimed at engineers in DevOps and SRE roles who are interested in understanding how real-world challenges were tackled to enhance system reliability.

Speaker

Eran Kampf


Eran Kampf is the VP of Engineering for Twingate, a zero trust company that loves geeking out on DevOps, K8s, and security. Prior to Twingate, Eran worked for and consulted for a number of startups ...