To CMDB or not to CMDB – is that the question?

graph cmdb database

CMDBs have a bad reputation in many circles. They are seen as expensive, have been associated with costly IT failures, high overhead clumsy processes, are reviled by some, and are thought to be incompatible with DevOps. In my opinion, they don’t have to be that way. The idea of a database that knows everything about your IT environment, replaces manual documentation and springboards automation is incredibly attractive. What would a CMDB (configuration management database) look like that is easy to install, and easier to maintain – one that followed the DevOps mantra of automating everything? This post explores that question.

In the ITIL® world, the idea of a single CMDB has been supplanted by the idea of a collection of databases that in turn feed into a Grand Unified View of everything. Rather than address this Grand Unified View, which is a difficult problem, let’s see what we can do well in a single database, and leave the unification and manual entry to others.

Modern CMDB Characteristics

Here are the characteristics which I think any contender for a modern (DevOps compatible) CMDB architecture would have to have:

  1. Discovery-based – everything in the database should be discovered with little or nothing entered manually. Most things that require manual entry should go somewhere else.
  2. Automatically maintained in “real time”- goes along nicely with discovery-based.
    • When it’s always correct and up to date within seconds or minutes new opportunities and uses become possible.
    • The fast-moving goals of DevOps require this.
    • Data that’s automatically discovered and updated is always correct.
  3. Support bare metal, virtual, cloud and container environments. Different information may be available in each environment.
  4. Highly extensible
    • customers need to be able to discover information unique to their environment
    • customers need to be able to easily integrate it into their own alerting systems, SIEMs, internal processes, change control systems, etc.
    • customer extensions should not require database upgrades for schema changes
    • this implies a high degree of openness.
  5. Highly detailed – capable of representing even small configuration details. You want enough detail such that the information needed to support most root cause analysis is available in the database. When it’s correct detailed and up-to-date, it can be used in place of expensive and hard to maintain manual documentation.
  6. Highly scalable. Scalability into the 100K server range is desirable. With cloud environments and high-scale SaaS environments, it’s hard to put a smaller upper bound on what’s needed.
  7. Include servers (real, virtual or cloud), operating systems, containers, applications, IP and MAC addresses and network gear as a minimum.
  8. Support relationships as first-class citizens. Many of the most interesting things about data centers is how all the parts fit together. This is well-recognized by the ITIL framework. Few old-school CMDBs do this well.
  9. Should not set off security alarms. Port scans and massive pings are out. Many old-school CMDBs are highly intrusive on the network and have been known to light up intruder alarms like a Christmas tree. This makes discovery over the network difficult or impossible. Avoiding adding noise to the security environment is essential for automation. If you have an intrusive (scanning) CMDB system and it doesn’t set off security alarms, your security team likely needs better intrusion detection tools.
  10. Provide extensible automation of other activities (security, monitoring) based on discovery updates.
  11. Should be based on a largely schemaless approach. Rigid schemas are typically incompatible with the requirement for extensibility and avoiding database updates.
  12. Should work well across multiple sites.

Modern CMDB Secondary Characteristics

Here are some things which I think fall out naturally from the primary characteristics above:

  1. Based on a graph database. Between the schemaless requirement and the need to support relationships as first-class citizens, there really isn’t any other rational choice. Many interesting questions about IT are graph-theoretic. There are a number of graph databases. My personal preference is Neo4j.
  2. Based on server and switch-resident agents everywhere possible. If you want continual updates, scalability, no setting off security alarms, and great detail in the data, then you really have to have agents resident on as many of your endpoints as possible. Polling over the network (using any method or API) scales poorly at best, and when used for discovery often sets off security alarms.
  3. Highly open architecture. Although there are other ways to get this, an open source approach is best method to maintain such an architecture over the long term.

Things You Can’t Discover for your CMDB

Having a database that’s always up to date is a killer idea, but there are inevitably things which can’t be automatically discovered in a useful way. Most of these are Human process-related things like:

  • Approval processes
  • Paper sign-in logs for the data center
  • Alerting policies for server, service, or switch failures
  • and similar things

Nevertheless, the ability to know everything about everything else and have it be always up to date is incredibly useful, and for good tools, it should be straightforward to deploy. We now have an evaluation of the Assimilation System Management Suite against these modern criteria.

CMDB Questions

  • What are your favorite characteristics for a “modern” CMDB?
  • Why is having all the details correct and at your fingertips valuable to you?
  • What do you want to integrate your CMDB with?

Photo Credit: A R Wilkinson via Compfight cc

Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

You have to agree to the comment policy.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

9 thoughts on “To CMDB or not to CMDB – is that the question?

  1. As I mentioned in the article, not everything can be automatically discovered, so there’s clearly room and a need for manual entry. For those things that can be discovered automatically, it makes sense to discover them automatically. Otherwise we’d be in the position of Henry Ford who said “If I asked my customers what they wanted, they’d have asked for a faster horse”. We move beyond the faster horse.

    Integration between Assimilation and CoreDoc should be easy, if CoreDoc has appropriate APIs. You just listen for the creation and modification of interesting objects (servers, etc.) using our event API (http://assimilationsystems.com/2015/12/07/assimilation-event-api-overview/), and then invoke the CoreDoc API to update the CoreDoc database with whatever it is that you’re interested in.

    And, you can discover “rogue” assets or devices that are connected to your network, and validate that all the MAC addresses, etc. are known to CoreDoc.

    Similarly, when a server or service goes down, you could listen for those events, and create tickets in CoreDoc automatically.

    Sounds like we’re pretty complementary. Thanks for your note!