Writing an Assimilation Discovery Agent

assimilation discovery agent image: grabbing cogs

One of the coolest things about the Assimilation System Management Suite is that it can discover nearly anything – and it’s easy to write your own Assimilation discovery agent and discover something new. Now, you can finally know it all! Discovery is key to the Assimilation Suite. Since everything is discovered, there’s very little need for human configuration of anything. In this blog post, I’ll explain how to write a discovery agent, and how to fully integrate it into the suite.

The first thing to know is that each discovery agent discovers one category of information – for example, PAM rules, the contents of /proc/sys, the contents of /etc/login.defs, and so on. In many cases, this involves parsing a single file or running a single command to produce the information that you want to discover. In other cases, it’s a bit more complicated. But simple or complicated, the idea is for your script to take all the information in one knowledge domain and transform it into JSON. Once it’s in JSON, the Assimilation Suite takes the JSON from your script, and stores it in our Neo4j graph database. Once it’s in the database, you can write best practice rules, trigger other discovery scripts, or create new nodes and relationships in the Neo4j graph database. For this article, we’ll just concentrate on the process of discovering the information, and integrating your agent into the Assimilation Suite.

Important Assimilation Discovery Agent Directories

There are a few relevant directories to be aware of when writing a discovery agent. Let’s go over them briefly.

The Assimilation Discovery Agent Directory

All discovery agents are stored in the project directory discovery_agents. Each discovery agent is in a file by itself, and needs to be marked readable and executable (typically mode 0755). At the current time, there are a number of discovery agents you can use as examples. The auditd_conf and login_defs commands are good examples of simple discovery agents for you to look over when writing your own discovery agent.

Writing an Assimilation Discovery Agent

Each discovery agent has a particular scope of what it’s discovering. Although it would be easy to make all the values strings, values that are integers should be JSON integers, values that are yes/no or on/off, should be JSON booleans (true or false). Keep in mind that JSON only supports decimal integers, so you may need to convert them from hex or octal to decimal. Certain kinds of values (like ulimit values) can be either an integer or unlimited. In this case, what seems to make sense is to have the values either be an integer or the JSON value null. This makes sense since effectively means that no limit applies. Other files may have similar situations. How to handle them should be thought through carefully before deciding on an approach. As of this writing, all our discovery agents are written as POSIX shell scripts. For scripts which are to become part of the base system, we prefer this approach – because dependencies are typically minimal.

Assimilation Discovery Agent Test Script

Now that you’ve written your script, you need to make sure its tested properly.

There is a script called testcode/test_discovery.sh which is used to perform simple tests on discovery agents. It runs each script with sample input, then verifies that the output is proper JSON using jsonlint, and that it hasn’t changed from previous runs. In addition, it makes sure that the script will produce expected output if the information it needs to discover is missing. To allow this testing to take place without replacing system configuration files or commands, each script has an environment variable which tells it where to find the file it’s parsing. To ensure a new discovery agent gets tested, change the test_discovery.sh file to tell it the name of the new script, and the name of the environment variable it uses to tell it where the configuration file is. The code for that looks something like this:

    testlines='auditd_conf AUDITD_CONFIG 
    login_defs LOGIN_DEFS_CONFIG'

Add the name of the new discovery agent, and the name of its environment variable into the value of testlines – and that’s all you need. Of course, the first time you run the tests, you need sample input, and you have to validate that the output is what’s desired by hand. More about that below.

Assimilation Discovery Agent Test Input

For the discovery agent to produce predictable output for the test, it needs predictable input. Inputs which will be supplied to discovery scripts are stored in the testcode/discovery_input directory. Most of these discovery agents just parse a file, but some (notably proc_sys) process a directory structure. For the auditd_conf discovery script, the test scripts gives it the  discovery_input/auditd_conf file to process, and the login_defs discovery script, is given the discovery_input/login_defs file. The test input should exercise all the variations of input that are permitted by the format of the file being parsed.

Assimilation Discovery Agent Test Output

The reference results from the tests also have to be kept under source control as well. In our case, we store them in the testcode/discovery_output directory. Before you submit them in a pull request, please make sure that the output looks correct.

Getting An Assimilation Discovery Agent To Run

If you’ve added the discovery agent to the project, then it will be automatically distributed to all machines running nanoprobes. But you’ll still need to instruct the Assimilation Suite to perform the discovery actions on systems. To do that, you have to make two changes to the cma/cmaconfig.py file. The cmaconfig.py file does two things: it creates rules for validating a configuration, and it also provides the default configuration. Both parts need to be modified.

Extending Discovery Validation

The validation code is part of the ConfigFile class – in particular it’s defined as part of the default_template variable. When you look at the default_template, there is a key value called initial_discovery which lists all the valid discovery names. You need to add the name of your discovery agent to this set value. Once you’ve done this, then you are allowed to name it as a discovery agent. Read on to actually cause it to get invoked…

Causing Your Discovery Script To Be Run

There is a static method later on in the class definition called default_defaults which defines the default configuration. Part of this configuration is the list of discovery agents which are run by default. There is a value in the return value from this function also called initial_discovery which defines what discovery agents will always be called for every nanoprobe, and the order in which they’ll be given to the nanoprobe. To make your new agent be run, add it to the initial_discovery array – and away you go. If you need for it to be called only under certain circumstances, or only with certain arguments which depend on prior discovery, this isn’t the way to go about it.

Providing Default Options to Your Discovery

This step is optional. Given what’s been done so far, your discovery code will be invoked periodically at the default interval with default timeouts and the default warning time. If you want to invoke it at some different interval, with a default timeout, or a default warning time, you’ll have to make another change to the default_defaults return value. Later on in default_defaults, there is a value called discovery, and under it is another value called agents. Below that you put the values you want to be default for your agent. Here is an example from the cmaconfig.py file for  the checksums agent:

    'checksums': {'repeat':3600*8, 'timeout': 10*60 , 'warn':5*60},

This  specifies that the checksum discovery will be repeated every 8 hours, with a 10 minute timeout. A warning will be issued to syslog if the discovery takes longer than 5 minutes. Although it’s nice that this Python object notation looks a lot like JSON, this is Python code, not JSON.

That’s All Folks

Although there are several steps to be followed, the most complicated of them (writing the discovery script) is often quite simple, and the rest are very simple. If you’d like to try your hand at writing one, we have a backlog of scripts that need to be written at our project Trello issues board.

Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

You have to agree to the comment policy.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

4 thoughts on “Writing an Assimilation Discovery Agent