Redundant
NICs on Solaris
Thomas Kranz
You have your resilient network in place. Dual switches, dual
routers, HSRP, failover, redundant firewalls -- they're
all there. Now, what about your Sun boxes? Having two network interface
cards (NICs) on your Solaris server, with the primary NIC failing
over to the secondary, seems like an obvious and easy task, yet
there are many pitfalls that make it unnecessarily complicated.
In this article, I'm going to explore how to provide redundant
NICs simply and cheaply.
I'm not going to be covering Sun Trunking or Alternate Pathing
in this article. Sun Trunking needs to be purchased as add-on software
and it's designed to trunk interfaces together, not provide
redundancy. Alternate Pathing might fit the bill, however, it introduces
a higher level of complexity, making overall management harder.
The first place to look is interface grouping. Introduced in Solaris
2.6, interface groups changed the behavior of the IP stack and the
kernel. Previously, if Solaris received packets on the secondary
interface, it would send them out the primary, regardless of whether
the primary was up. With interface groups, you tell Solaris that
the two NICs are connected to the same subnet. Routing tables are
then manually modified to show that routes can be out of either
interface, which have different IP addresses. Interface groups are
enabled by using ndd to set ip_enable_ifs_groups to
one in /dev/ip. Although earlier releases of Solaris 2.6
and higher had this enabled by default, a subsequent patch (to fix
problems with sending the correct hostname) has changed this to
be disabled by default.
When a packet comes in on either interface, the kernel updates
the ARP (address resolution protocol) cache to reflect the fact
that a particular host is available on a particular interface. All
packets still go out the primary interface. However, when the primary
goes down, destination hosts (whose MAC addresses are defined by
the ARP cache so that they are available via the secondary interface)
will have packets sent out that interface to them. Beyond that,
any incoming packets on the secondary interface will update the
ARP cache so that hosts previously marked as being on the primary
interface will now be seen as available on the secondary.
That's the theory, anyway. In practice, the results are unexpected
and unreliable. Interface groups only work well for directing traffic
in via multiple interfaces; the redundancy side of things doesn't
work too well. Symptoms include excessive TCP retransmits, resets
on the interfaces, and the ARP cache not being updated.
So, although it looked promising, interface groups don't
completely solve the redundant NIC issue. Another way to approach
the problem is to run a script that monitors the interfaces, and
brings them up or down accordingly. The first logical place to start
is with some poking around with ndd. /dev/hme contains
a link_status variable. A value of 1 indicates that a link
is up, and a value of 0 indicates it is down. Thus, a script is
written that parses ndd's query of link_status,
and brings interfaces up and down accordingly.
However, it doesn't quite work like that in the field. link_status
will report a value of 1 (link is up) for any interface that hasn't
been plumbed, or that hasn't been configured with an IP address
(even a null one). For example, if you have a qfe board, and qfe2
isn't physically plugged in, or configured, link_status
will be 1. If you ifconfig qfe2 up with a random IP address,
link_status will then correctly report 0.
Things are complicated on the switch end, as well. For example,
let's say you have redundant Cisco Catalyst switches providing
your network connectivity. To test the ndd script, we can
go to the switch and disable a port. link_status reports
0, and the link is down. However, when we re-enable the port, link_status
still reports 0, and the Catalyst sees the port as not connected.
It's not until we actually try to send data over that link
that the interface driver wakes up and sets link_status to
1, whereupon the Catalyst also wakes up and flags the port as connected.
This poses problems for us -- while all this is going on, the
Catalyst will be discarding traffic, because it sees the link as
down.
So, monitoring interfaces with ndd is a dead end. However,
could we accomplish the same thing using a much simpler method,
such as ping? A simple script (Listing 1) pings the
default gateway, which will be our redundant network kit. If that
ping fails, we know we've lost connectivity. The script
will then ifconfig down that interface, and ifconfig
up the second interface, which has already been configured with
the same IP address. It will then try another ping to the
default gateway, to bring up the link and check that all is indeed
well.
This script runs from cron once a minute, and after some extensive
testing seems to fit the bill. Total runtime when failing over is
around 40 seconds, so we don't have any danger of the script
not finishing by the time cron kicks off another run.
The final piece of the puzzle is configuring that redundant interface
on bootup. Adding an entry to /etc/hosts and creating a corresponding
/etc/hostname.<int> file will configure the interface,
but will also bring it up at boot time -- not something we want
to happen. This means we need to have a script that's run on
bootup, which will correctly configure the secondary interface with
a duplicate IP address, but not bring it up.
Listing 2 shows such a script. It depends on a config file
called /etc/redundant.int, which contains the name of your
redundant interface. From there on in, it parses your current configuration
and configures the secondary interface with the duplicate IP address,
but doesn't bring it up. I run this script out of /etc/rcS.d/S31scnd_int,
which will run just after /etc/rcS.d/S30rootusr.sh, which
is responsible for bringing up the primary interface and setting
routes.
There are several advantages to this approach. We don't need
to worry about what the network kit thinks is going on; it's
free; it utilizes existing kit without introducing a need for extra
purchases; it's simple; and it could easily be expanded to
other platforms. As there are no Solaris-specific aspects to this,
it can be modified to work with whatever UNIX platform on which
you need redundant connectivity.
Tomas Kranz has been a sys admin for six years, and is currently
Senior SysAdmin at Flutter.com. In his copious free time, he enjoys
cycling and spending time with his family. He can be reached at:
thomas.kranz@flutter.com.
|