Assimilation Monitoring Rules – Keys to Automated Monitoring

ruler with level - Assimilation Monitoring Rules - pun intended. No apologies ;-)

The Assimilation System Management Suite monitors servers and services automatically – which is way cool! There are two things it needs to monitor services – a monitoring agent (script) and a rule that tells it when and how to use this agent. This article explains how to create Assimilation monitoring rules which teach the Assimilation software when and how to use monitoring agents. These rules are the keys to fully automated monitoring. When your monitoring is fully automated, complexity goes down, and availability goes up.

The Assimilation suite has two interesting kinds of monitoring agents it currently supports OCF (Open Cluster Framework) resource agents, and Nagios remote API monitoring agents. This article will concentrate on the OCF agents, since they’re more powerful and a more recent design API.

What Are Assimilation Monitoring Rules?

Assimilation monitoring rules are commented JSON strings which tell the Assimilation Suite how to monitor services.  They give regular expressions which must be matched, expressions (values) they match against, which monitoring agent they go with, and what arguments to pass to the monitoring agent. When a new service is discovered by the Assimilation discovery process, it is compared against all the monitoring rules we have in order to figure out what kind of a service it is, and how to monitor it.

 The rest of this article will show how this works through an example monitoring rule.

A Sample Assimilation Monitoring Rule

For our example, we’ll give the rule for monitoring the Neo4j database with an OCF resource agent which we wrote. The JSON of this rule is shown below. In subsequent sections we’ll dissect this rule in detail, explaining all its parts.

{
  "class":    "ocf",
  "type":     "neo4j",
  "provider": "assimilation",
  "classconfig": [
#   OCF parameter  expression-to-evaluate                regular-expression
    [null,         "@basename(),                          "java$"],
    [null,         "$argv[-1]",  "org\\.neo4j\\.server\\..*Bootstrapper$"],
    ["ipport",     "@serviceipport()",                    "..."],
    ["neo4j_home", "@argequals(-Dneo4j.home)",            "/"],
    ["neo4j",      "@basename(@argequals(-Dneo4j.home))", "."]
  ]
}

At the top are three simple attributes. Here’s what they mean:

  • class – what type of monitoring agent is this rule for – in this case, it’s an OCF resource agent. Many of you may be interested in the “nagios” class monitoring agents.
  • type – this is the name of the resource agent. The agent is called neo4j.
  • provider – this is an OCF term meaning who provides this agent. Each OCF provider gets their own namespace, helping to deal with the possibility of conflicting names. Since the Assimilation project is the provider for this particular resource agent, we put “assimilation” in for this field. The net of this is that the monitoring agent is named “assimilation/neo4j”.

The remaining top level attribute is “classconfig” which we’ll talk about in the next section.

Classconfig in Assimilation Monitoring Rules

The classconfig section of an Assimilation monitoring rule is where all the interesting stuff happens – it’s where the rubber meets the road. This is where we determine whether this rule matches a particular service, and how to pass it parameters.

This section is a list of 3-element tuples. Each tuple has the same structure. Here’s what the elements of the tuple mean:

  1. OCF parameter – the name of the OCF parameter that this expression provides. OCF parameters are passed through to the OCF resource agents in the environment. A null for this element means that this expression isn’t passed into the OCF environment.
  2. expression-to-evaluate – an expression to evaluate to match against the regular expression in the 3rd element of the tuple, and optionally to be passed as an OCF parameter (the 1st element of the tuple).
  3. regular-expression – a (Python) regular expression which the value of the expression-to-evaluate is compared against.

The classconfig attribute consists of a bunch of these 3-tuples, if all the regular expressions match all their corresponding expressions-to-evaluate, then the this rule can be used to monitor that particular resource.

Detailed walkthrough of our sample classconfig

Let’s walk through the classconfig section for this example and see if we can’t make sense of it all. It doesn’t look too complicated, but it is a bit mysterious. So let’s walk through it and strip away the mystery!  Each line is independent of the other lines, so we’ll just walk through them one at a time.

First line: [null, “@basename(), “java$”]

This line matches the last path component of the full pathname of the binary being looked at against the regular expression “^java$” (the ^ is implicit). This rule will match if the service is a Java program. The value of the @basename() expression is not provided to the resource agent (hence the “null” first value).

Second Line: [null, “$argv[-1]”,  “org\\.neo4j\\.server\\..*Bootstrapper$”],

This line will match the last argument ($argv[-1]) given to the service against the regular expression “org\.neo4j\.server\..*Bootstrapper$”. This will match if the last argument given to the Java indicates that it’s a neo4j Java program. The value of $argv[-1] will not be provided to the resource agent. More about array indexing in the Appendix A below.

Third Line: [“ipport”, “@serviceipport()”, “…”]

This line will match, provided that the return from the serviceipport() function contains at least 3 characters. The result of this (an IP:port combination) will be passed to the neo4j resource agent as the parameter ipport.

Fourth line: [“neo4j_home”, “@argequals(-Dneo4j.home)”,  “/”]

This line will match provided that the resource agent has an argument of the form
“-Dneo5j.home=
something”, and something starts with “/”. The value of something will be passed to the neo4j OCF resource agent as the parameter neo5j_home. The argequals() function is explained in more detail in Appendix B below.

Fifth line: [“neo4j”, “@basename(@argequals(-Dneo4j.home))”, “.”]

The expression here returns the basename of the pathname which was specified as the -Dneo4j.home command line argument. This value will match if it contains at least one character. The value of this expression is passed to the neo4j OCF resource agent as the parameter neo4j.

About OCF resource agent parameters

As required by the OCF resource agent specification, when the OCF resource agent is invoked, each of the parameter names is prefixed with OCF_RESKEY_ to avoid conflicts with existing environment variable names.

Conclusion

As you can see, teaching the Assimilation suite how and when to use a new monitoring script isn’t hard. You have to know a little about the service being monitored, what the monitoring script wants as parameters. Once you know these things, you know what you need to know to monitor it

Once you’ve done a ps(1) of the running service, and you know what parameters to pass to the agent, it takes just a few minutes to write the rule telling the Assimilation system when and how to use the new monitoring agent.

Having done this, then no matter where this service shows up, no matter the IP address, system, port or configuration details – it will be automatically monitored. As noted in the introduction, I think this is way cool ;-). If you want to take actions or integrate with other services on the basis of this automated monitoring, check out the Assimilation Event API.

So, now you know why the discovery-driven Assimilation Monitoring Totally Rules – and how to create the rules to make it work for you ;-).

Appendix A: Expressions in Assimilation Monitoring Rules

Assimilation monitoring rules are invoked when the Assimilation suite discovers a new service being offered on a system. When these rules are evaluated, they are given a context of the service being offered. In terms of the Assimilation suite, this is a ProcessNode – which has all the information about the server process. The attributes that a ProcessNode has include the following:

  • $host – the hostname it’s running on
  • $pathname – the full pathname of the binary providing this service
  • $argv – an array of (string) arguments that the service was given when it started
  • $uid – the user id the process is running as
  • $gid – the group id the process is running as
  • $cwd – the service’s current working directory

In addition, all the attributes of the host (“Drone”) it’s running on are also available. In this example, all the expressions of interest are the return results of built-in functions. Because all the attributes are JSON, expressions like $a.b.c search the context for values in the JSON. In addition, array expressions – like $argv[0] are legal. If a negative index is given, it follows the Python convention that $argv[-1] is the last argument, and $argv[-2] is the next-to-the-last argument, and so on.

Appendix B: Functions in Assimilation Monitoring Rules

For convenience, there are a quite a few functions available to use in the monitoring rules. Here are a few which are most useful in monitoring expressions.

  • dirname: This function returns the directory name from a pathname. If no pathname argument is supplied, then the discovered service executable name ($argv) is assumed.
  • flagvalue: A function which searches a list for a -flag and returns the value of the string which is the next argument. The -flag is given by the argument in args, and the list ‘argv’ is assumed to be the list of arguments. If there are two arguments in args, then the first argument is the array value to search in for the -flag string instead of ‘argv’ The flag given must be the entire flag complete with – character. For example -X or –someflag.
  • hascmd: This function returns True if the given list of commands are all present on the given Drone. It determines this by looking at the value of $commands. This function allows you to disable a monitoring rule if a command needed by the monitoring agent isn’t present.
  • serviceip: This function returns the IP portion of the return from serviceipport() and takes the same arguments.
  • serviceipport: This function searches discovery information for a suitable ip:port combination. The argument to this function tells it an expression that will give it the hash table (map) of IP/port combinations for this service. The return value is a legal ip:port combination for the given address type (ipv4 or ipv6)
  • serviceport:  This function returns the port portion of the return from serviceipport() and takes the same arguments.

Although there are more functions available, these are the ones you’re most likely to need while writing Assimilation monitoring rules. Most of the rest of the functions are more useful for writing Assimilation best practice rules.

Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

You have to agree to the comment policy.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

One thought on “Assimilation Monitoring Rules – Keys to Automated Monitoring