This is exactly the situation we ran into when we launched our SAAS platform SignalFx to general availability. Internally it is composed of 15-20 different micro-services, making capacity planning a big challenge. We blogged about our experience here
Metrics based approach to capacity planning . SignalFx is a metrics based monitoring perform, so in a meta way, we used SignalFx to capacity for SignalFx's launch
tl:dr; version of our lessons and suggestions
- Design your architecture to be loosely coupled, so that it is possible to capacity-plan for each sub-component independently. Break a complex problem into N simpler ones
- Identity the 'limiting system resource' for each component individually (i.e. what will hit the wall first - CPU, memory, network etc.). You can do this through a combination of experimentation and plain and simple reasoning based on understanding of how it works
- Identify a business metric that correlates with the utilization of the limiting resource (e.g. api calls per second, number of logged in users, or whatever)
- Use analytics/math to project the capacity of the system, and how much free capacity you have (make sure to leave enough buffer, e.g. most services won't run very well at 99.99% cpu)
At the end, you'll have something like this for each component of the system - e.g. "if I'm CPU bound on component X, and CPU of X linearly goes up with API_calls/s, and I'm currently at 5000 API/sec at 50% CPU, then I have total capacity for 9000 API/sec (with a 10% buffer) and free capacity for another 4000 API/sec.
Now divide and conquer - let each component owner the responsibility to manage capacity of their system based on business needs provided to them.