Resilience Testing Distributed Systems with Fuzzy Monkey Testing

Fuzzy Monkey testing distributed systems - Red Colobus from Wikipedia

One of the keys to good software is good testing. There are well-known testing suites for back end code – things like junit and py.test. There are also good front-end testing tools – things like Selenium. But for testing distributed systems there aren’t so many well-known tools – because the problem is quite different, and harder. In this blog post we’ll cover the “Fuzzy Monkey” methodology used for testing three different successful distributed systems (including the Assimilation Suite) – its history and how and why it works.

There Is Always Another Way

always another way

When it looks like you’re stuck and it seems like you have no way out, if you’re willing to admit you were wrong, perhaps you can find another way to solve your problem. Although the story I tell below is a software development, manufacturing and product management story, the moral applies in lots of places. Solve the problem you actually have, not the one you think you have. I learned something important from this story that I value to this day – there’s always another way.

Finding what’s hidden in plain sight

Back in the 90s I was involved with about 100 other people in a project to develop a new voice mail system – software, hardware and firmware. The hardware was a completely new design, and the software was about 70% new. Along the way we stumbled into something that improved our end quality in a way that can reasonably be described as stunning. What we discovered was how to ask questions in a way that brought important things that “everyone knows” (and are effectively hidden in plain sight) to the attention of those who can do something about it.