Assimilation Subgraph Queries and Visualization

assimilation subgraph queries

In the 1.1.7 release of the Assimilation System Management Suite, we added a completely new type of query – the subgraph query. What’s really cool about subgraph queries is that they are exactly what’s needed for visualization. So, this article is about Subgraph Queries and Visualization – what they are, how they relate to each other and why this is totally cool.

The Assimilation Suite stores all the data we discover about your infrastructure, services and so on in a Neo4j graph database. This means essentially means that we have we have a graph model of many aspects of your IT infrastructure, stored in way analogous to a Visio diagram. Normal SQL databases store things in tables – things that look a lot like spreadsheets. Instead, a graph database stores them in a format which more closely resembles a Visio diagram, or one of those things people draw on the board with circles and arrows during technical discussions.

Graph Visualization

Since Visio is often the preferred tool for drawing technical diagram, it’s not too surprising that a graph database lends itself to visualizations. There’s a problem though, any diagram which has all the detail of all the systems, settings and so on for everything in your IT environment is so complicated that it’s simply impossible to make sense of. If you could render it all even on a long wall in a conference room, it would be so complex and so intricate that it would simply be impossible to follow. Instead of giving you deep and clever insights like a visualization is intended to give you, it would confuse you, terrify you and give you a massive headache. This is obviously sub-optimal ;-).

Visualizing Part of the Graph

What you want instead is just what you need to see for the purpose at hand. If you’re a network guy, you likely have network thoughts in mind. If you’re doing an availability assessment, you have different things you’re interested in, if you’re doing root cause analysis of a problem you want to know different things, and if you’re a penetration tester you want to know yet a different set of things.

The key is that all the data you want is there in the database, but you really only want to see some of it – the part you’re interested in – which depends on who you are, what your role is, and why you’ve come to the Oracle of Database ;-). For most of these purposes you have some specific area of interest, and what you want to know is certain kinds of things about that area, and those things related to that area, and how all these pieces relate to each other. That is, you want to examine a subgraph of the entire graph. You want to visualize that subgraph. From the database perspective, what you want is a visualization of the results of a subgraph query.

Assimilation Subgraph Queries

Although Neo4j is a graph database, for the most queries the output that comes out from a query looks like the output from any SQL database – in other words it looks like a spreadsheet (table) – not a graph or a subgraph. So the question comes, how would you represent a subgraph in the output anyway? Basically what you want from a subgraph query is a description of the subgraph. That is, you want to know all the nodes in the subgraph that you’re interested in, and you want to know all the interesting relationships between those nodes in the subgraph.

Next you give it to some kind of magical layout program which lays out those objects on the screen, or paper, or an image and draws you a wonderfully insightful picture of what you wanted to know about. I’m not going to talk about those magical layout programs, but they’re available, and the problem they try and solve is quite interesting and quite hard. But for the purposes of this article, they’re covered by a “somebody else’s problem field” – so we’ll ignore them :-D.

Currently we only provide visualizations with our drawwithdot program. In the future we expect to do interactive visualizations as well. It’s worth noting that the first version of drawwithdot mentioned in this security article and this one in Security Week, did not use subgraph queries, but the version in 1.1.7, switched over to subgraph queries – which made it more than 10x faster.

Neo4j Subgraph Queries

The key thing about any of this really is that you have to know what subgraph is most interesting, and from our perspective, you have to be able to issue a Cypher query return this interesting subgraph to us. More specifically, we want to know where to start the query, that is what nodes absolutely must be in the output, what relationships are interesting to us, what kinds of nodes are interesting to us, and how many levels of relationships we want to follow from these initial starting nodes. So, a subgraph query consists of these elements:

  1. Where to start the query
  2. What kinds of nodes to include in the result
  3. What kinds of relationships to include in the result
  4. How many layers of relationships do we want to follow to find those things we’re interested in.

In the 1.1.7 release, we added general support necessary for subgraph queries, and provided two specific subgraph queries. The two queries are:

  • allhostsubgraph – finds things related to all hosts in the system (not selective on hosts)
  • hostsubgraph – finds things related to a specific set of hosts

For both queries, you can specify which types of nodes you want to include in the output, and which set of relationships you want to follow to find the results. Of course, there is the question of what format the result comes out in – since it’s not at all clear how to represent this in the sort-of-spreadsheet tabular output format the Neo4j uses for graph results. The result is a bit peculiar, but perfectly fitting.

The output from any subgraph query is a single row, with two columns. The first column is a list of all nodes in the query result, and the second column is the list of all the relationships in the query result. This is pretty much exactly what you need to give to a visualization tool like dot from the graphviz package, or the corresponding code in the D3 visualization package. In fact, this format is the same pretty much regardless of what subgraph you want to visualize.

Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

You have to agree to the comment policy.

This site uses Akismet to reduce spam. Learn how your comment data is processed.