Resilience Testing Distributed Systems with Fuzzy Monkey Testing

Fuzzy Monkey testing distributed systems - Red Colobus from Wikipedia

One of the keys to good software is good testing. There are well-known testing suites for back end code – things like junit and py.test. There are also good front-end testing tools – things like Selenium. But for testing distributed systems there aren’t so many well-known tools – because the problem is quite different, and harder. In this blog post we’ll cover the “Fuzzy Monkey” methodology used for testing three different successful distributed systems (including the Assimilation Suite) – its history and how and why it works.

Finding what’s hidden in plain sight

Back in the 90s I was involved with about 100 other people in a project to develop a new voice mail system – software, hardware and firmware. The hardware was a completely new design, and the software was about 70% new. Along the way we stumbled into something that improved our end quality in a way that can reasonably be described as stunning. What we discovered was how to ask questions in a way that brought important things that “everyone knows” (and are effectively hidden in plain sight) to the attention of those who can do something about it.

The Incredible Power Of The Right Questions

The Incredible Power of Good Questions

There is incredible power in asking the right questions, and following the answers where they lead – especially when they lead to uncomfortable places.

As you probably remember, in March, 2011, a magnitude 9.0 earthquake hit Japan resulting in a massive tsunami which damaged the Fukushima nuclear power plant. Some of the most serious damage occurred because there was no power to cool the reactor.

Scalability from Doing Nothing?

Scalability is the ability to respond gracefully to increased workload. When you have enough of it, life is good. When you have trouble scaling up and your workload goes up, as it inevitably does, life becomes complicated, sometimes miserable. In this blog post I tell you why doing nothing can be the best way to scale…

Living Dangerously With Crypto In the Assimilation Project – Part 3

In this article, we talk in more detail about the Assimilation Project’s reliable UDP protocol, our decision to avoid session keys, factors influencing our initial choice of crypto libraries, and touch on key revocation. So, like before we’re looking forward to your comments on our design choices. Like before, grab your thinking cap, sit down with your crypto buddies and think hard about what we’ve done.

Living Dangerously with Crypto in the Assimilation Project – How Many Keys?

This article outlines our approach to keys and key management given our unique problems in a pragmatic and effective way. Although we will use crypto libraries with well-proven algorithms, we will use them in slightly unconventional ways. So, get your crypto buddies, grab a beverage (adult or otherwise), put on your thinking cap, and think hard about how we’re planning on approaching these challenges. Although I’ve tried to think all this through, I’m not a crypto expert – which is why I’m asking for your help.

Crypto background for the Assimilation project

Since its inception, the open source Assimilation project has been concerned with security, and paranoid at every opportunity. Like a lot of software, it has serious security concerns. On the one hand, our nanoprobes run on every server in the enterprise and exercise root privileges – creating a potentially dangerous attack surface. On the other hand, we incrementally create a high-value database which has fine-grained and up-to-date information about everything in the environment – software versions, ports, services, IP addresses and MAC addresses, known security vulnerabilities – a veritable treasure map for an attacker. This article details why cryptography is essential for communication in this environment, and some unique aspects of the problem we’re solving that affect how we use it. It is our hope our readers (this means you!) will give us a thorough flogging^H^H^H^H^H^H^H review of how we use cryptography in our software in this article and the next.