Cover V10, I06
Article
Listing 1
Listing 2

jun2001.tar


Redundant NICs on Solaris

Thomas Kranz

You have your resilient network in place. Dual switches, dual routers, HSRP, failover, redundant firewalls -- they're all there. Now, what about your Sun boxes? Having two network interface cards (NICs) on your Solaris server, with the primary NIC failing over to the secondary, seems like an obvious and easy task, yet there are many pitfalls that make it unnecessarily complicated. In this article, I'm going to explore how to provide redundant NICs simply and cheaply.

I'm not going to be covering Sun Trunking or Alternate Pathing in this article. Sun Trunking needs to be purchased as add-on software and it's designed to trunk interfaces together, not provide redundancy. Alternate Pathing might fit the bill, however, it introduces a higher level of complexity, making overall management harder.

The first place to look is interface grouping. Introduced in Solaris 2.6, interface groups changed the behavior of the IP stack and the kernel. Previously, if Solaris received packets on the secondary interface, it would send them out the primary, regardless of whether the primary was up. With interface groups, you tell Solaris that the two NICs are connected to the same subnet. Routing tables are then manually modified to show that routes can be out of either interface, which have different IP addresses. Interface groups are enabled by using ndd to set ip_enable_ifs_groups to one in /dev/ip. Although earlier releases of Solaris 2.6 and higher had this enabled by default, a subsequent patch (to fix problems with sending the correct hostname) has changed this to be disabled by default.

When a packet comes in on either interface, the kernel updates the ARP (address resolution protocol) cache to reflect the fact that a particular host is available on a particular interface. All packets still go out the primary interface. However, when the primary goes down, destination hosts (whose MAC addresses are defined by the ARP cache so that they are available via the secondary interface) will have packets sent out that interface to them. Beyond that, any incoming packets on the secondary interface will update the ARP cache so that hosts previously marked as being on the primary interface will now be seen as available on the secondary.

That's the theory, anyway. In practice, the results are unexpected and unreliable. Interface groups only work well for directing traffic in via multiple interfaces; the redundancy side of things doesn't work too well. Symptoms include excessive TCP retransmits, resets on the interfaces, and the ARP cache not being updated.

So, although it looked promising, interface groups don't completely solve the redundant NIC issue. Another way to approach the problem is to run a script that monitors the interfaces, and brings them up or down accordingly. The first logical place to start is with some poking around with ndd. /dev/hme contains a link_status variable. A value of 1 indicates that a link is up, and a value of 0 indicates it is down. Thus, a script is written that parses ndd's query of link_status, and brings interfaces up and down accordingly.

However, it doesn't quite work like that in the field. link_status will report a value of 1 (link is up) for any interface that hasn't been plumbed, or that hasn't been configured with an IP address (even a null one). For example, if you have a qfe board, and qfe2 isn't physically plugged in, or configured, link_status will be 1. If you ifconfig qfe2 up with a random IP address, link_status will then correctly report 0.

Things are complicated on the switch end, as well. For example, let's say you have redundant Cisco Catalyst switches providing your network connectivity. To test the ndd script, we can go to the switch and disable a port. link_status reports 0, and the link is down. However, when we re-enable the port, link_status still reports 0, and the Catalyst sees the port as not connected.

It's not until we actually try to send data over that link that the interface driver wakes up and sets link_status to 1, whereupon the Catalyst also wakes up and flags the port as connected. This poses problems for us -- while all this is going on, the Catalyst will be discarding traffic, because it sees the link as down.

So, monitoring interfaces with ndd is a dead end. However, could we accomplish the same thing using a much simpler method, such as ping? A simple script (Listing 1) pings the default gateway, which will be our redundant network kit. If that ping fails, we know we've lost connectivity. The script will then ifconfig down that interface, and ifconfig up the second interface, which has already been configured with the same IP address. It will then try another ping to the default gateway, to bring up the link and check that all is indeed well.

This script runs from cron once a minute, and after some extensive testing seems to fit the bill. Total runtime when failing over is around 40 seconds, so we don't have any danger of the script not finishing by the time cron kicks off another run.

The final piece of the puzzle is configuring that redundant interface on bootup. Adding an entry to /etc/hosts and creating a corresponding /etc/hostname.<int> file will configure the interface, but will also bring it up at boot time -- not something we want to happen. This means we need to have a script that's run on bootup, which will correctly configure the secondary interface with a duplicate IP address, but not bring it up.

Listing 2 shows such a script. It depends on a config file called /etc/redundant.int, which contains the name of your redundant interface. From there on in, it parses your current configuration and configures the secondary interface with the duplicate IP address, but doesn't bring it up. I run this script out of /etc/rcS.d/S31scnd_int, which will run just after /etc/rcS.d/S30rootusr.sh, which is responsible for bringing up the primary interface and setting routes.

There are several advantages to this approach. We don't need to worry about what the network kit thinks is going on; it's free; it utilizes existing kit without introducing a need for extra purchases; it's simple; and it could easily be expanded to other platforms. As there are no Solaris-specific aspects to this, it can be modified to work with whatever UNIX platform on which you need redundant connectivity.

Tomas Kranz has been a sys admin for six years, and is currently Senior SysAdmin at Flutter.com. In his copious free time, he enjoys cycling and spending time with his family. He can be reached at: thomas.kranz@flutter.com.