Scalability is the ability to respond gracefully to increased workload. When you have enough of it, life is good. When you have trouble scaling up and your workload goes up, as it inevitably does, life becomes complicated, sometimes miserable. There are many complex techniques you can use, sharding, message queues and other forms of distributed processing to help your application scale up. Fundamental to all of them is the idea that you need to be able to delegate work. In computer science we like to use terms like distributed processing to capture the idea of delegation. In computer science, just like in real life, the ideal is to delegate everything – or at least as much as is possible. In the the Assimilation system management suite, we fully embrace this ideal. As much as possible, we try and delegate everything and do nothing centrally – and doing nothing scales really well!
To improve scalability, what you want to do is avoid running out of resources for as long as possible. System management software engages with systems (duh!). What if you can reliably distribute all the routine work out to the systems that you’re managing? If you can do this, then scalability won’t be a problem for a long time. You’re well on your way to graceful and smooth growth, with more happiness than misery.
Mindlessly Asking “Are you OK?” Hammers Scalability
For monitoring, most of this “normal” work is asking variations on the question “Are you OK?” – for a lot of different ideas of what “you” might be (servers, services, and so on). Fortunately, most of the time, the answer is “yes”. If computers could get frustrated, I imagine that asking a question like that over and over and overwhelmingly getting the same answer might make them a little crazy. OK, maybe it’s more like it would make me crazy. But more importantly, if some sort of centralized resource is involved in asking these questions, then it’s going to have scalability problems – even if it won’t go crazy. Unfortunately, this is how most monitoring systems work – with the central system doing some or all of this routine work. Similar statements apply to CMDB systems as well – only the questions then become “Has this changed?” “How about that?”. Some monitoring systems delegate little or nothing, and some delegate more things. As of right now, as far as I know, only the Assimilation Monitoring software delegates all this routine work out.
If Ya Got No Agents, Ya Get No Scalability
A key feature of delegation is that you have to have someone or something to delegate to. If you’re a one-person company, you have to hire other people (perhaps a virtual assistant, or a marketing guru) in order to be able to delegate things. For system management, that means you have to have agents on the machines being managed – or you won’t have anything to delegate work to. Like any good solution, agents have issues of their own – you have to distribute them, you have to trust them, they have to avoid interfering with their host systems, you have to keep them up to date. But without agents, you can’t gracefully scale up and you’ll eventually wind up in scalability purgatory. You’ll need more central resources, more network resources (to carry the traffic to continually talk to the machines) and on and on.
Put Your Agents To Work!
As I noted above, agents are not a zero-cost solution – but life is still much better with them than without them. So, if you’re going to have agents, you want to get as much value out of them as you can. In the case of the Assimilation Suite, we use a single lightweight, secure agent that serves all four components of the suite, maximizing the value we provide to our customers. We delegate both the discovery work and the monitoring work in exactly the same way. Just like delegating monitoring is key to scaling monitoring, delegating discovery is key to keeping the CMDB up to date in a scalable way, and that in turn supports all the other components – Monitoring, Security and Network. If you want to know more about how we go about delegating out so much more than others in a very simple way, then I suggest the RNNIGN protocol overview, and one of the recent technical talk videos. Or you could just wait excitedly for the blog article or fun video I’ll make explaining this.
What questions does this scalability discussion bring up for you? What about all this scaling, and monitoring, and discovery makes you crazy? ;-). Use the comments form below to let me know!