Arijit Mukherji - Slashdot User

Comment Re: Simplify the problem, use a metrics based appr (Score 1) 64

by Arijit Mukherji on Tuesday August 11, 2015 @11:13AM (#50293557) Attached to: Ask Slashdot: Capacity Planning and Performance Management?

Exactly. That is one of the things we consider in the blog.

Comment Simplify the problem, use a metrics based approach (Score 3, Informative) 64

by Arijit Mukherji on Monday August 10, 2015 @04:38PM (#50287725) Attached to: Ask Slashdot: Capacity Planning and Performance Management?

This is exactly the situation we ran into when we launched our SAAS platform SignalFx to general availability. Internally it is composed of 15-20 different micro-services, making capacity planning a big challenge. We blogged about our experience here Metrics based approach to capacity planning . SignalFx is a metrics based monitoring perform, so in a meta way, we used SignalFx to capacity for SignalFx's launch

tl:dr; version of our lessons and suggestions

Design your architecture to be loosely coupled, so that it is possible to capacity-plan for each sub-component independently. Break a complex problem into N simpler ones
Identity the 'limiting system resource' for each component individually (i.e. what will hit the wall first - CPU, memory, network etc.). You can do this through a combination of experimentation and plain and simple reasoning based on understanding of how it works
Identify a business metric that correlates with the utilization of the limiting resource (e.g. api calls per second, number of logged in users, or whatever)
Use analytics/math to project the capacity of the system, and how much free capacity you have (make sure to leave enough buffer, e.g. most services won't run very well at 99.99% cpu)

At the end, you'll have something like this for each component of the system - e.g. "if I'm CPU bound on component X, and CPU of X linearly goes up with API_calls/s, and I'm currently at 5000 API/sec at 50% CPU, then I have total capacity for 9000 API/sec (with a 10% buffer) and free capacity for another 4000 API/sec.

Now divide and conquer - let each component owner the responsibility to manage capacity of their system based on business needs provided to them.

Slashdot Top Deals