Do you believe cloud capacity is limitless? Have you ever run into an out of capacity error from your cloud service provider and felt desperate and powerless how to satisfy the scale-out demands of your application because you could not imagine you can’t scale up your auto-scaling group anymore? You are not alone, we also neglected capacity management as a crucial factor for service reliability and core competence of an SRE because cloud service providers’ value proposition made you believe capacity is never an issue on infrastructure level.
We want to put the spotlight on cloud capacity management if you are running services on a public cloud. We share our learnings as we run our SaaS service across 57 regions in 3 CSPs (AWS, Azure & GCP). We believe that establishing processes and living capacity management and planning as part of a SREs responsibility allow for higher availability and reliability of large Elastic Cloud services while also considering cost effectiveness as part of the process.
We highlight how a pragmatic approach for capacity planning for services allows us to bridge the gap to forecasting and predicting infrastructure capacity needs in order to improve availability and reliability of our platform. At the end you will learn how you can get easily started with capacity management yourself.