This article is now obsolete. It has been replaced by a newer post.
I’ve kept the original content below for historical purposes to document some (not very good) thinking I went through to get to a better solution.
In this article, we talk in more detail about the Assimilation Project’s reliable UDP protocol, our decision to avoid session keys, factors influencing our initial choice of crypto libraries, and touch on key revocation. So, like before we’re looking forward to your comments on our design choices. Like before, grab your thinking cap, sit down with your crypto buddies and think hard about what we’ve done.
The first article in this series described some of the unique secure communication challenges that the Assimilation project faces. In the second article, we studied how many keys we should use – and why.
Communication Patterns and Session Keys
When a machine first boots, there is an exchange of a dozen or so mostly small packets related to startup, discovery, and monitoring. Then the communication is normally silent but until an exception occurs – a service that goes down, a change in discovery results or something similar. This is typically very infrequent – often weeks or months between messages. The two roles (CMA and nanoprobe) behave slightly differently with respect to stopping and starting. Each nanoprobe only has a single reliable connection to the CMA, but the CMA effectively has connections to every nanoprobe – potentially hundreds of thousands of (overwhelmingly inactive) connections.
When a nanoprobe stops gracefully, it tells the CMA it is going down, and when it starts back up. By contrast, the CMA does not directly signal its nanoprobe clients when it goes down, or when it comes back up. The protocol is designed to simply recover from these occurrences – in order to avoid flurries of unnecessary traffic in a short period of time. Since it is often weeks or months between communication, there is no urgency to recover immediately – and delaying recovery until the next need to communicate distributes this work over time – often a long period of time. This improves scalability.
Normally, public key cryptography protocols are used only long enough to establish a shared session key, then further communication occurs encrypted with the shared session key – due to the relatively high expense of public key encryption and signature methods. However, given our circumstances, and this design, the infrequency of communication and the low communication volume and frequency, it makes much less sense to go to the trouble to establish a session key. In addition, it adds complexity. Complexity always decreases reliability, and often creates the possibility of security holes due to the complexity of establishing a session key. There have been several bugs in SSL implementations which related to this very thing. Avoiding it in a custom implementation seems prudent.
Therefore, at least for the present time, we are planning on performing all our encryption and digital signatures without establishing session keys. If the “100K+1 Keys” solution were chosen, the nanoprobe keys would likely be shared keys – because having each nanoprobe share their key with the CMA would acceptable. However, as noted in part 2 article, this is not our short-term solution.
Because of the infrequent nature of our communication and the large number of connections to keep track of, each packet will be encrypted separately. This also fits nicely into our protocol.
Assimilation UDP Message Format
This section outlines both the authenticated and unauthenticated options of our UDP message format.
Each UDP message is a message header frame followed by a sequence of TLV (Type, Length, Value) frames. The first frame in any message is always a signature frame. The signature can either be a cryptographic signature frame (aka a MAC) or merely a data integrity checksum (such as SHA256). After the signature frame, comes an optional encryption frame followed by an optional compression frame indicating how the data was compressed. After these frames comes the data frames for the message – which are different for each message type. Note that this means we’re using the encrypt-then-MAC method of composing our packets. This is typically thought to be the best method – and it’s clearly the most modular.
When a message is received, the signature frame is verified, the data is decrypted, then it is decompressed. When it is sent, the data is compressed first, then encrypted, then signed. This ordering helps compression, since if it were encrypted before being compressed, it would likely not compress at all, and would likely grow in size, since encrypted data effectively looks random.
Note that this message format supports any arrangement of keys as described in the part 2 article – so it won’t have to change based on how many keys are distributed.
Choice of Cryptographic Libraries
At the present time, we are planning on using libsodium as our initial encryption library. Libsodium is a fork of Dan Bernstein’s NaCl library implementing the curve25519 and ed25519 algorithms. Curve25519 is used for encryption, and ed25519 is used to perform the MAC function.
There are two reasons for this choice – speed and simplicity. Because we plan on using public key encryption for all messages, speed of the crypto library is more important than in some other cases. Unlike many other crypto libraries, libsodium has ease of use as a strong goal. Simplicity is good, since this means it is more difficult to make errors in the Assimilation code – and compromise either correctness or security through our mistakes.
In addition because in every case we expect to know keys in advance, and not need to negotiate session keys or deal with an external public key trust infrastructure, the level of the API is a near-exact match for our needs.
One person suggested that we should perhaps use DTLS as our transport layer. Two things about the DTLS protocol caused me concern for this application. The main obvious problem is that it only supports datagrams up to 16K bytes. There are a few known discovery items which exceed 100K uncompressed. We could not tolerate the 16K limitation without adding significantly more complexity to our code. The second concern is that DTLS has to keep session state – which then has to be recovered for every connection every time the CMA restarts. As was mentioned earlier, this is something to be avoided if possible. If we were in the design space for DTLS (large numbers of small streaming packets for a single endpoint) it would likely be a better match than it currently is. However, we send few datagrams per connection, have a large number of connections, and some datagrams stray close to UDP maximums.
As it turns out, because of the circumstances in our particular case, the libsodium API appears to be nearly exactly what we need. More complex libraries and APIs reduce the chances of the code being correct – both the Assimilation code and the library code.
In the Assimilation model, nanoprobes are extended as little trust as possible, which leaves all remaining trust in CMA. Therefore our approach to key revocation is to implement a key revocation packet in the CMA, and use the existing update distribution mechanisms for distributing the new keys. This eliminates the need to send out a new key using our protocol, while allowing the immediate disabling of a compromised key. This is not a perfect solution, but it should be enough for an initial implementation. The main flaw that I can see with using the compromised key to revoke itself is that an attacker could maliciously revoke the key they were had obtained a copy of. That doesn’t seem like a very likely action – but it’s possible. Kind of like they’re doing us a favor. Did I miss something here? Machines which are down when the message goes out have to be skipped until later. This will be somewhat annoying, but not fatal.
Although we believe the “1-Key” implementation should be enough for an initial usable implementation, clearly it’s an area where deeper thought might yield a good “100K-Key” implementation which was still easy to install and use. It is likely that a better implementation would result by collaborating with a large customer after getting some experience with real deployments – than merely by thinking in a vacuum. More thought could also be given to a more elegant key revocation mechanism where the revocation of one, and installation of the other key was smoother and more automated. These are items worth considering when sufficient resources become available.
Another concern is – how high is the overhead of using libsodium’s public key cryptography? This seems like a good thing to benchmark.
Of course, the biggest chunk work in data security for the Assimilation Project is simply to implement the ideas described here and get some mileage on them and give them a chance to prove themselves out.
I’m looking forward to your comments below! If you have an aversion to comment forms – email email@example.com and I’ll incorporate them – I even have a GPG key [717A640E] or join the Assimilation development mailing list here. However, if things seem puzzling here, and you haven’t read the previous articles, then I recommend that you go back and read them.