Capacity Constraints Unveiled: Navigating Cloud Scaling Realities

Talk

Do you believe cloud capacity is limitless? Have you ever run into an out of capacity error from your cloud service provider and felt desperate and powerless how to satisfy the scale-out demands of your application because you could not imagine you can’t scale up your auto-scaling group anymore? You are not alone, we also neglected capacity management as a crucial factor for service reliability and core competence of an SRE because cloud service providers’ value proposition made you believe capacity is never an issue on infrastructure level.

We want to put the spotlight on cloud capacity management if you are running services on a public cloud. We share our learnings as we run our SaaS service across 57 regions in 3 CSPs (AWS, Azure & GCP). We believe that establishing processes and living capacity management and planning as part of a SREs responsibility allow for higher availability and reliability of large Elastic Cloud services while also considering cost effectiveness as part of the process.

We highlight how a pragmatic approach for capacity planning for services allows us to bridge the gap to forecasting and predicting infrastructure capacity needs in order to improve availability and reliability of our platform. At the end you will learn how you can get easily started with capacity management yourself.

Speakers

Daniel Aschwanden

SRE @ elastic

Daniel Aschwanden is a SRE Engineering Manager at Elastic’s Cloud Capacity Team. He has enjoyed working with all system and security related things ever since laying his hands on

...

Marc-Andre Dufresne

SRE @ Elastic

Marc is an SRE, dad and mountain biker. He has over 15 years of experience in the industry in multiple roles, from infrastructure consulting to leading developer teams. He has a passion

...