The three pillars of IT security are confidentiality, integrity, and availability. Most of the press coverage is all about confidentiality – at least until we have an airline or two or three have trouble with availability ;-). Of course, availability is also a key dimension of server management with significant operational dimensions. Those of you who know me, know I have a deep expertise in availability. Unsurprisingly, in this post, I’m going to concentrate on availability – and the necessity of monitoring everything, and knowing that you’re monitoring everything.
Although this article and title are a bit tongue-in-cheek, the reality behind the title is serious. In the average data center, 30% of their servers are mainly space heaters [they had their brains eaten ;-)]. Given that many data centers are strictly limited on power, cooling and floor space, and that power and maintenance are significant costs, this is a big deal. This happens primarily because the staff managing those servers have don’t have a clear idea of what all their servers are doing.
There is incredible power in asking the right questions, and following the answers where they lead – especially when they lead to uncomfortable places.
As you probably remember, in March, 2011, a magnitude 9.0 earthquake hit Japan resulting in a massive tsunami which damaged the Fukushima nuclear power plant. Some of the most serious damage occurred because there was no power to cool the reactor.