In this article, we look at three simple link-level network best practices that apply to every network – explain how to detect them, what the consequences are, and how to fix them.
Network Best Practices: Duplex matching
The first network best practice used to commonly cause problems, but shouldn’t come up any more – but due to bugs and incompatibilities in autonegotiation it occasionally happens anyway. When it does, it destroys application behavior and is often a nightmare to find.
It’s simple – the duplex ought to match on both sides of an ethernet connection. That is, if one side runs full duplex, the other side ought to also be full duplex. If one side is half duplex, then the other side ought to be half duplex too. With modern autonegotiation, it ought to never occur – but most network professionals I’ve talked to say they still see it happen occasionally.
What happens when this goes wrong?
If both sides send a packet at the same time, then the side that’s set to half-duplex will not receive that packet. The result is this: At low traffic rates, everything looks fine. Pings work, and packet loss rates are low. High traffic rates tell another story – packet loss rates climb rapidly, and can amount to 50% or more packet losses. In a modern environment with complex multi-layer applications, this kind of problem can be catastrophic to applications, incredibly obscure, and difficult to diagnose without a good set of tools.
How do you fix it?
Usually unplugging and plugging the cables at one end (or rebooting the host) will force the two sides to autonegotiate again, and the problem will almost always go away. Alternatively, you can administratively force both sides to full (or half) duplex, and that will also make the problem go away.
How do you detect it?
This is a little harder. On Linux, you can find this out through the /sys filesystem. If you examine the contents of /sys/class/net/<devicename>/duplex, you can see the duplex setting of the device (“full” or “half”). Finding out the other half is easy if you have LLDP enabled and use a tool like OpenLLDP, Open-LLDP, lldpd or the Assimilation suite. The Assimilation software automatically collects both sides of the connection in its database – making this a snap. If you don’t have the right tools, you have to check the manuals for your switch, and see how you can get the switch to tell you how the switch end of the connection is set.
Unfortunately, reboots, switch resets and cable unplugging can cause this at unpredictable times, albeit very rarely. If you want to fix these problems quickly, you really need a tool set that will find this when it happens. If you know of tools besides the Assimilation Suite that can detect this quickly and scalably, please let me know – and I’ll update the article and mention them here.
Network Best Practices: MTU matching
This is an issue similar to duplex matching, the switch and the host ought to have “compatible” maximum transmission unit (MTU) settings. In this case, compatible doesn’t mean an exact match is required, but the switch MTU has to be greater than or equal to the host MTU.
What happens when this goes wrong?
The host’s setting on MTU will be ineffective, and the smaller switch setting will prevail. If you are expecting greater performance from jumbo frames, the switch’s lower MTU will prevail, putting your throughput in the porcelain throne.
How do you fix it?
If your switch supports the packet size the host expects, then administer your switch with the MTU the host is expecting. If it doesn’t, then set both sides to the largest size supported by both sides. On Linux, you can temporarily set the MTU with the ip command. To set it permanently, see your distribution’s documentation on setting up networking (see /etc/network/interfaces for Debian-based systems, and /etc/sysconfig/network-scripts for RedHat based systems).
How to detect it?
This is going to sound a bit familiar… On Linux, the file /sys/class/net/<devicename>/mtu contains the MTU in bytes. You can find the other half if you have LLDP enabled on your switch and use a tool like OpenLLDP, open-lldp (yes, they’re different), lldpd or the Assimilation software. The Assimilation software automatically collects both sides of the connection in its database – making this a snap. If you don’t have the right tools, you have to check the documentation for your switch, and see how you can get the switch to tell you the MTU of the switch end of the connection.
Network Best Practices: Subnet MTU matching
What this means that every system on a subnet (or more technically a broadcast domain) ought to have the same MTU.
What happens when this goes wrong?
Typically if the systems have non-default MTUs, this is to enable jumbo frames. Unless they all have it enabled, then communication between these machines will use the smaller setting. If you are expecting greater performance between the machines with mismatching MTUs, you won’t get it. On the other hand, if there is an older device on the subnet that isn’t performance critical, this may not be an issue to you as TCP will negotiate the MTUs to the lower value, and things will still work.
How do you fix it?
The cure to this is the same as fixing MTUs in the MTU best practice above – except for doing it over and over again. Set the MTUs on all the host ports to the same values, and ensure that all the host MTUs on a subnet are identical.
How to detect it?
Follow the procedures above for every host on every subnet. Boy is that annoying… You have to know which subnets are intended to support jumbo frames, and which aren’t, and then check everything for consistency. 99 MTUs on the wall, 99 MTUs, take one down, and pass it around, 98 MTUs on the wall… Like that…
Conclusions
Unless you have a lot of staff, without good tools to automate it, you probably aren’t going to be able to consistently follow these network best practices. On the other hand, if you have the right network management tools like the Assimilation network management (or your favorite tool set) – which brings all the MTUs and duplexes of hosts and switch ports into the same database (CMDB), it’s far more manageable. Get tools and use them. Ideally you’d like tools that autoconfigure, but autoconfiguration or not, if you’re like me, if you don’t have good tools, you probably won’t do the job consistently.
Since I founded the Assimilation Project, obviously my take is that you should install it. Because it keeps the data up to date, it’s going to keep that data up to date, and you just have to query it out to see if you have a problem. It’s in the plans to add these best practice rules to the set that we check – in which case you’ll get an alert when you get any of these problems.
Please note: I reserve the right to delete comments that are offensive or off-topic.