nov2001.tar

Reliable Network with Solaris^TM

Peter Baer Galvin

Until recently, it was very difficult to configure a Solaris machine to have redundant connections to a network, and to use them automatically in case of a failure. Because of the magic of Solaris 8, the task is now easy. If you are not IP Multipathing yet, you should be.

The Problem

Consider a Solaris host on a network. By default, it expects one network connection per subnet to which it is being attached. If it sees the same subnet on more than one interface, then one interface is used for all outbound packets, and any interface can be used by inbound packets (based on their destination addresses). Unfortunately, if the one outbound interface fails, then traffic is outbound no more.

Until recently, there were two standard methods to solve this problem. One was to buckle down and write scripts that would ping a device (say the default gateway). If the ping failed, the script could configure another interface on that subnet to handle the traffic. Of course, scripts must be debugged, supported, and updated, annoying their authors.

Alternatively, the Ethernet Trunking "Sun Consulting Special" could be purchased from Sun Professional Services. This set of scripts basically did the above work for you. Of course it cost money, and was only somewhat supported by Sun.

The problem is exacerbated by Sun servers' roles in a variety of different architectures. One example is shown in Figure 1, which shows a standard three-tier architecture, as might be found at a Web site. Most firewalls and load balancer clusters automatically manage their IP addresses during a failover. They always make an IP address available to the tier "above" them. Likewise, a database cluster has its IP addresses managed by the cluster software, which moves IPs between cluster servers as needed. The only component in this environment that does not provide such functionality is the Sun servers. Should a network cable, a switch port, or host bus adapter in the communications channel between the Sun and its network go bad, the Sun will be unavailable. Although the facility will continue to function by using the redundant server, user connections may need to be reestablished, state could be lost, and performance could be negatively affected.

The Solution

IP multipathing was introduced in Solaris 8 release 10/00. It has quite a bit of useful functionality when two or more interfaces are attached to the same subnet. These interfaces are called a "group". The functionality includes:

Automatic load balancing of outgoing connections over the group. The load balancing is on a per-TCP-session basis.
Failover of a main IP address to an alternative interface, should the primary fail to reach a remote destination.
Active-active or active-standby failover modes (the other interfaces can be active or just used in case the primary fails). Active-active is recommended.
Automatic fail-back once the failed component is fixed.
Built in to Solaris with full Sun support and no cost.

IP multipathing does have some limitations and requirements as well:

Each interface in the group must have a unique MAC address.
Each adapter in the group must be of the same type (token ring, Ethernet, etc.).
The interfaces on the same subnet must have an associated group name.
Each interface in the group must have a test address.
Each interface should have a data address (for application use).
Alternate pathing (AP) and IP multipathing appear to be mutually exclusive.
It is best to use separate I/O boards or host bus adapters for each interface in a group (rather than two interfaces from the same QFE adapter, for example).

IP multipathing functionality is implemented by a new daemon, in.mpathd. For failure detection and correction to work, each interface must be configured with an additional IP alias to be used as a test address. This second IP address must be on the same network as the real IP address. The test address may not be used by applications. When in.mpathd is started, it periodically sends out an ICMP ECHO request ("ping") to the host's gateway from each test address. If the gateway doesn't respond, it sends a multicast to find any other host. If one or more hosts reply, one is chosen as a new ping target for testing. If no reply is received, in.mpathd assumes the link is down and moves all addresses on the interface, but the test address, to another interface in the group. It continues to try the interface to recover from failure once the problem is repaired, and returns the addresses to their original interface when that happens.

IP Multipathing Implementation

To enable IP multipathing, several changes must be made to the system. These can either be done at the command line, or as shown here, in configuration files. For changes to be made permanent, they must be made in configuration files.

First, make sure that each interface has a unique MAC address by issuing the command:

setenv local-mac-address? true

at the "ok" prompt, or the command:

eeprom "local-mac-address?=true"

as root. A reboot is needed for this change to take effect, but make the other configuration changes first to avoid repeated reboots.

Next, modify /etc/hosts to contain the additional network addresses. It is a good practice to create an address for each interface (the normal and dummy addresses). For IP multipathing, the system needs test addresses for each interface as well. For host wb-1:

# External-facing interfaces
10.1.3.101 wb-1-e
10.1.3.102 wb-1-e-dummy
10.1.3.111 wb-1-e-qfe0
10.1.3.112 wb-1-e-qfe2
# Internal-facing-interfaces
10.1.4.1   wb-1-i
10.1.4.2   wb-1-i-dummy
10.1.4.11  wb-1-i-qfe1
10.1.4.12  wb-1-i-qfe3

Likewise, for wb-2:

# External-facing interfaces
10.1.3.151 wb-2-e
10.1.3.152 wb-2-e-dummy
10.1.3.161 wb-2-e-qfe0
10.1.3.162 wb-2-e-qfe2
# Internal-facing-interfaces
10.1.4.51  wb-2-i
10.1.4.52  wb-2-i-dummy
10.1.4.61  wb-2-i-qfe1
10.1.4.62  wb-2-i-qfe3

The "group" name tells the system which interfaces are on the same subnet, thus allowing proper test and result analysis.

Now update the /etc/hosts files to reflect the new network names and enable IP multipathing on each set of interfaces. On wb-1:

/etc/hostname.qfe0:

wb-1-e-qfe0 netmask + broadcast + group wb-1-e deprecated -failover \
addif wb-1-e netmask + broadcast + failover up

/etc/hostname.qfe2:

wb-1-e-qfe2 netmask + broadcast + group wb-1-e deprecated -failover \
addif wb-1-e-dummy netmask + broadcast + failover up

Similarly, edit the files for qfe1 and qfe3. The "deprecated" flag prevents the use of these addresses by applications, as they are only for failover. The "failover" allows the system to recover if an interface failure is detected.

Finally, a reboot of the system will enable IP multipathing. ifconfig -a can be used to check the status of the interface groups and their states. Any status changes are also reported via the syslog mechanism. For example, a link or interface failure, as well as the failover to correct the problem, will be reported there. Once IP multipathing is up and running, the system will load balance outbound connections across each interface in a group. in.mpathd will monitor the interfaces to determine if there is a failure, will fail traffic over to other group members on failure, and will re-enable traffic once the failed interface is again functioning. This functionality allows Solaris servers to be perfect members of highly available facilities, such as the one depicted in Figure 1.

Further configuration of failover parameters can be done in /etc/default/mpathd. Here the fault detection and failover time can be changed from its default ten seconds. Note that the system sends ICMP ECHO request at a rate of ten times the failover time, so decreasing the failover time will increase the amount of network traffic being sent to detect a failure.

An interesting use of IP multipathing can occur on a system with only one interface per subnet. Here, the same configuration steps can be performed. Although no resilience is provided, network, interface, or host bus adapter failures are more accurately detected and reported.

One problem with IP multipathing occurs on the Solaris 8 releases 10/00 and 04/01 when the router discovery daemon is in use (in.rdisc). Extraneous ICMP errors occur when an interface in an IP Multipathing group is pinged.

For full details on IP multipathing in general, and the ping problem in specific, see the very good "IP Network Multipathing" Sun Blueprints Online document from August 2001, available at:

http://www.sun.com/blueprints/online.html

The best full reference for IP multipathing is in the documentation set, entitled "IP Network Multipathing Administration Guide" (available at docs.sun.com).

Thanks to Frank Corrao from Corporate Technologies for contributing to this column.

Peter Baer Galvin (http://www.petergalvin.org) is the Chief Technologist for Corporate Technologies (www.cptech.com), a premier systems integrator and VAR. Before that, Peter was the systems manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines, and previously wrote Pete's Wicked World, the security column, and Pete's Super Systems, the systems management column for Unix Insider (http://www.unixinsider.com). Peter is coauthor of the Operating Systems Concepts and Applied Operating Systems Concepts textbooks. As a consultant and trainer, Peter has taught tutorials and given talks on security and systems administration worldwide.

Reliable Network with SolarisTM

Reliable Network with Solaris^TM