sep2001.tar

High-Availability File Server with heartbeat

Steve Blackmon and John Nguyen

Maintaining maximum system uptime is becoming increasingly critical to the success of any organization. While there are many off-the-shelf solutions for high availability, they are often very expensive and require expertise that smaller companies do not have on staff. In this article, we present a much lower cost alternative to achieving high-availability (HA) services using inexpensive hardware and publicly available software. A systems administrator can learn to use and maintain our system with minimal time investment. We will provide step-by-step procedures for building a high-availability file server for UNIX and Windows clients. Although the article focuses on how to set up a file server, the technique could be applied to any number of services.

Hardware and Software Components

Hardware

To get started, you will need two systems with at least one network interface each (preferably two), an available serial port, a SCSI controller, and an external SCSI hard drive.

We used two identically configured Intel ISP 1100 servers with 650-MHz Pentium III processors and 128 MB of RAM. These systems each have two integrated 10/100 Ethernet interfaces and are rack-mountable (1U). Each system has two internal IDE drives, which we used for the OS installation. For our shared disk, we used an external 9-GB SCSI drive that is attached to both systems. Our SCSI controllers are Adaptec AHA-2940AUs (see Figure 1).

Software

We used Red Hat Linux 6.2 (kernel 2.2.14-5.0), Samba version 2.0.6-9 (included with RH 6.2), and heartbeat version 0.4.9-1 (available from: http://www.linux-ha.org/download).

heartbeat is a publicly available package written by Alan Robertson. heartbeat provides the basic functions required by any HA system such as starting and stopping resources, monitoring the availability of the systems in the cluster, and transferring ownership of a shared IP address between nodes in the cluster. heartbeat is a software solution that monitors the health of a particular service (or services) through either a serial line or Ethernet interface or both. The current version supports a 2-node configuration whereby special heartbeat "pings" (broadcast/multicast messages) are used to check the status and availability of a service. It is a vital component of the whole Linux-HA package.

Although heartbeat is currently only available for Linux, the next release will include support for Solaris, FreeBSD, and OpenBSD.

We are grateful to Alan for contributing this useful software and also for his input while we were writing this article.

Procedure

This is the procedure we used with our hardware. You may need to adapt this procedure depending on you situation and/or hardware available to you.

Hook Up the Equipment

Figure 1 shows the connectivity required for this cluster.

Connect a null modem cable between the serial ports on each system.
Connect a Cat 5 crossover cable between your second ethernet interfaces on each system.
Connect a SCSI cable from each system to your external SCSI disk.

Change the SCSI ID on Your Primary System

With SCSI, every component on the bus must have a unique ID, including the host adapter cards that normally have a default ID of 7. Because our SCSI bus will have two host adapters and a disk, we need to change the ID of one of the adapters. We changed the SCSI ID of our primary system to 6 and left the ID of the secondary system at 7. The ID of the adapter must be changed from the SCSI BIOS. With Adaptec SCSI controllers, you typically get into the BIOS configuration screen by pressing <control>A when prompted during the boot process. If you are using some other host adapter, you will need to refer to your manual to figure out how to change the SCSI ID of your adapter.

Install the Operating System on the Primary System

We called our systems "ttisrv1" and "ttisrv2"; ttisrv1 is our primary system and ttisrv2 is the secondary. You will want to give the primary Ethernet interface of each system a unique public address. You also need to configure your secondary Ethernet interfaces with IP addresses, but you can pick any unique subnet because these interfaces will be private. If you do a custom installation, don't forget to include Samba.

Set Up the External Disk

You will need to partition and create a filesystem on your external disk. Note that this is only necessary on your primary system. We used a single partition that contained the entire disk.

Create partition with fdisk:

ttisrv1 #  fdisk /dev/sda

The number of cylinders for this disk is set to 1116. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with:

1. Software that runs at boot time (e.g., LILO)

2. Booting and partitioning software from other OSs

(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1116, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-1116, default 1116):
Using default value 1116

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: If you have created or modified any DOS 6.x
partitions, please see the fdisk manual page for additional
information.

Now we create a filesystem on the external SCSI disk:

ttisrv1# mkfs /dev/sda1

Create a mount point for the disk:

ttisrv1# mkdir /ttidisk

Make sure you can mount the filesystem:

ttisrv1# mount /dev/sda1 /ttidisk

Create a directory to hold the Samba password file:

ttisrv1# mkdir /ttidisk/smb

Create a public directory for file sharing and set the proper permissions:

ttisrv1# mkdir /ttidisk/public
ttisrv1# chmod 1775 /ttidisk/public

Unmount the drive (heartbeat will mount it for you):

ttisrv1# umount /ttidisk

Download and Install heartbeat

After downloading heartbeat, install it on the primary system:

ttisrv1# rpm -ivh <download_path>/heartbeat-0.4.9-1.i386.rpm

Configure heartbeat

The procedures for configuring heartbeat are well documented and you can find examples along with the documentation in /usr/share/doc/packages/heartbeat. We show you the specific configuration we used for our implementation; if you require more information, please refer to the documentation in the abovementioned directory.

There are three files that you will need to set up to get heartbeat working: authkeys, ha.cf, and haresources.

Configure /etc/ha.d/authkeys

This file sets your authentication keys for the cluster that must be the same on both nodes. You can choose from three authentication schemes: crc, md5, or sha1, depending on your security needs. We chose md5. Here is our /etc/ha.d/authkeys file:

# use md5 with key "ttikey"
auth 3
3 md5 ttikey

The authkeys file must only be readable by root or heartbeat will not start. Be sure to set the appropriate permissions after creating this file:

# chmod 600 /etc/ha.d/authkeys
Configure /etc/ha.d/ha.cf

This file defines the nodes in the cluster and the interfaces that heartbeat uses to verify whether or not a system is up. Here is our /etc/ha.d/ha.cf file:

# define nodes in cluster
node     ttisrv1
node     ttisrv2

# time a system must be unreachable before considered dead (seconds)
deadtime 5

# set up for the serial heartbeat pulse
serial   /dev/ttyS0
baud     19200

# interface to run the network heartbeat pulse
udp      eth1
Configure /etc/ha.d/haresources

This file describes the resources that are managed by heartbeat. The resources are basically just start/stop scripts much like the ones used for starting and stopping resources in /etc/rc.d/init.d. Note that heartbeat will look in /etc/rc.d/init.d and /etc/ha.d/resource.d for scripts. Here is our /etc/ha.d/haresources file:

# use ttisrv1 as primary, use 192.168.0.100 as shared IP
ttisrv1 192.168.0.100 Filesystem::/dev/sda1::/ttidisk::ext2 smb nfslock nfs

This line tells heartbeat to start these resources on ttisrv1 with the shared IP address of 192.168.0.100. It also tells heartbeat to mount the filesystem found on /dev/sda1 at the /ttidisk mount point and to start Samba and NFS.

Configure /etc/hosts

Your /etc/hosts file should contain entries for both of the nodes in your cluster and your shared IP address. Here is an except from out /etc/hosts file on ttisrv1 (each system should have the same entries):

127.0.0.1    ttisrv1 localhost.localdomain localhost
192.168.0.99         ttisrv2
192.168.0.100        ttisrv

Configure Samba

Samba version 2.0.6 is included with Red Hat Linux 6.2. If you did not select the option to include it when you installed Red Hat on your machine, you can install it from the installation CD using RPM:

ttisrv1# rpm -Uvh /mnt/cdrom/RedHat/RPMS/samba-*.rpm

Substitute /mnt/cdrom with the mount point of the CD drive on your machine if it is different.

Configure smb.conf

Check to see if Samba is already running. Type:

ttisrv1# /etc/rc.d/init.d/smb status

If Samba is not active you will see:

smbd is stopped
nmbd is stopped

If Samba is running, you will see:

smbd (pid 5103) is running...
nmbd (pid 5114 5112) is running...

Note: the PIDs might be different for your host. If Samba is running, stop it:

ttisrv1# /etc/rc.d/init.d/smb stop

It really doesn't matter whether Samba is running or not. Modifications that we make to its configuration file will be picked up by Samba because, by default, it checks that file every 60 seconds for changes. For consistency's sake, we would like Samba to come up after we are completely done with our configuration. Remember to save the original copy in case you have to go back to it. Locate the file /etc/smb.conf (or /etc/samba/smb.conf), and let's get to work.

Installing Samba is a simple process. You either select it as an option during your Red Hat installation or install the package later using RPM. Configuring it is another matter. We won't go into a Samba configuration tutorial here. However, Using Samba from O'Reilly & Associates delves into this subject, and we recommend that you refer to this book for further understanding.

With that said, Listing 1 shows what we've done with our Linux host to turn it into a Samba server. We basically started out with the default smb.conf file that came with our system and modified it to suit our needs. Please note that this is not the complete smb.conf file; it only lists the options that we actually changed. You can leave the other options as they were in the original file.

Get Samba Ready

After you've finished making changes to the smb.conf, you can check to make sure that your configuration file is set up correctly and free of errors:

ttisrv1# /usr/bin/testparm -s

If there is no error, you are ready to go. However, don't start the Samba server at this time. The heartbeat program is set up to start it automatically.

Next, add a user to your Samba password file. This user should already have a valid account on your Linux machine (i.e., the account is present in your /etc/passwd file). If not, Samba will refuse to add the user. For our example, we will add the user steve.

Mount the filesystem:

ttisrv1# mount /dev/sda1 /ttidisk

Add the user:

ttisrv1# /usr/bin/smbpasswd -a steve

Samba will prompt you for the new SMB password; enter it accordingly.

Unmount the drive (heartbeat will mount it for you):

ttisrv1# umount /ttidisk

Please note that the mounting and unmounting of /ttidisk is totally unnecessary once the system is up and running with heartbeat in control of the shared filesystem. This is just a demonstration to provide you with a starting point for your Samba server.

Configure NFS

Make sure that you don't start NFS on boot up:

ttisrv1# /sbin/chkconfig --del nfs

Then add the links to kill NFS on shutdown or reboot:

ttisrv1# /sbin/chkconfig --level 016 nfs off

These steps are necessary because we want heartbeat to control the startup of NFS.

Add the following to /etc/exports. Create the file if it doesn't already exist:

# Export the shared disk, allowing read/write access and
# synchronous I/O with no write delay.
/ttidisk    192.168.0.*(rw,sync,no_wdelay)

Test heartbeat

Start heartbeat on the primary system with the following command:

ttisrv1# /etc/rc.d/init.d/heartbeat start
Starting High-Availability services: [  OK  ]

If it fails, look in /var/log/messages to determine the reason and then correct it. After heartbeat starts successfully, you should see a new interface with the IP address that you configured in the ha.cf file. This interface is an alias, so you will see it displayed like the following:

ttisrv1# ifconfig

<...clipped output...>

eth0:0    Link encap:Ethernet  HWaddr 00:D0:B7:00:B5:09  
          inet addr:192.168.0.100  Bcast:192.168.0.255  \
            Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:7 Base address:0x7000

Note: You might experience a short delay while heartbeat attempts to bring up the interface. You should also see that the disk has been mounted:

ttisrv1# df -k

Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/hda1              2016016     42612   1870992   2% /
/dev/hda6             11385128    376292  10430500   3% /usr
/dev/sda1              8823404       552   8374644   0% /ttidisk

Also check to see that Samba and NFS started successfully:

ttisrv1# /etc/rc.d/init.d/smb status

smbd (pid 5729) is running...
nmbd (pid 5740 5738) is running...

ttisrv1# /etc/rc.d/init.d/nfs status

rpc.mountd (pid 5825) is running...
nfsd (pid 5841 5840 5839 5838 5837 5836 5835 5834) is running...
rpc.rquotad (pid 5816) is running...

Configure the Secondary System

The secondary system can be configured with a subset of the steps used to configure the primary. The steps for configuring the secondary system are very similar to the primary.

First, install the operating system as outlined in the section labeled Install the Operating System on the Primary System. Next, download and install heartbeat per the section labeled Download and Install heartbeat. We recommend that you copy all the configuration files created on the primary system to the secondary system. You could manually go through the process of creating them again but it is time consuming and prone to error. You can get all the files you need in a tar archive with the following command:

ttisrv1# cd /
ttisrv1# tar cvf <path>/hafiles.tar /etc/smb.conf  \
  /etc/ha.d/haresources /etc/ha.d/ha.cf /etc/ha.d/authkeys \
  /etc/exports

After creating the tar file, transfer it to the secondary system using FTP or whatever method you prefer (you could mount the shared disk on the primary, put the tar file on it, umount it, and then mount it on the secondary system for the extraction), and extract the configuration files:

ttisrv2# cd /
ttisrv2# tar xvf <path>/hafiles.tar

Remember to add entries for the primary system and shared IP address into your /etc/hosts file.

Test Connectivity

At this point both systems should be ready to go. There are a few tests you should try to make sure everything is in order before testing failover. First, make sure you can ping one system from the other on both interfaces. You also need to test that your serial connection is functional.

On ttisrv1:

ttisrv1# cat < /dev/ttyS0

On ttisrv2:

ttisrv2# echo "TTY test" > /dev/ttyS0

You should see the text on ttisrv1. You should also reverse the test to make sure you have bi-directional communication.

We will now test the ability to mount the disk on the secondary system. First, create the mount point:

ttisrv2# mkdir /ttidisk

Make sure you can mount the filesystem:

ttisrv2# mount /dev/sda1 /ttidisk

Unmount the disk:

ttisrv2# umount /ttidisk

Start heartbeat on Secondary System

If you've made it this far with no problems you are in excellent shape! All we have left to do is start heartbeat on the secondary system, and then we can test to make sure it all works.

Start heartbeat on secondary system:

ttisrv2# /etc/rc.d/init.d/heartbeat start

Testing Failover

You can test failover by simply stopping heartbeat on the primary system:

ttisrv1 # /etc/rc.d/init.d/heartbeat stop

You should see all the services come up on the second machine in 30 seconds or less. If you do not, look in /var/log/messages to determine the problem and correct it. You can fail back over to the primary by starting heartbeat again. heartbeat will always give preference to the primary system and will start to run there if possible.

Caveats

In the configuration we have described, we used only one disk, which is a single point of failure. It would be preferable to either use a hardware RAID device or two disks with mirroring, which we felt was beyond the scope of this article. We have tested this configuration with a hardware RAID device with no anomalies noted. We have not tried to use software RAID 1 (mirroring), although we think it would work fine. We would also like to point out that any pending disk writes during a failover could fail depending on the precise timing, but a second attempt would work fine.

It is possible to corrupt your disk if both of the systems attempt to mount the filesystem read/write at the same time. This condition is known as "split brain", and you must take every precaution to ensure it does not happen. If both systems were to mount the same filesystem read/write, they would both attempt to keep the superblock synchronized without regard for changes being made on the other system. This will result in data corruption. The best way to reduce this risk is to have multiple heartbeat interfaces, so heartbeat can determine the status of the other system in the cluster. If you use only a single Ethernet interface and that interface fails, heartbeat will assume the system is down and attempt to take over the disk. By using an Ethernet and a serial heartbeat interface, it would take two distinct failures before split brain could occur.

Other Uses

There are endless possibilities for how you may use heartbeat to provide high-availability services. It is particularly good for providing Web services and read-only file access. For example, if you have a number of CDs that you would like to make available to your users, you could purchase a SCSI device with multiple CDs and share them all via Samba and NFS.

Summary

We have provided a way to set up a very useful and highly available file server using inexpensive hardware and software that are free, readily available, and relatively easy to set up. You are only limited by your imagination as to how you can expand this sample system to include other components to meet your needs. We hope that we have piqued your interest enough for you to get started on your own high availability project. Feel free to send us your comments.

Steve Blackmon cofounded Transparent Technologies, Inc. in 1999. He has been a Software Developer and System Administrator for 14 years. He currently provides consulting expertise in the areas of high-availability, SAN, and IT infrastructure to high-profile clients in the Atlanta area. He can be reached at: steve.blackmon@transtech.cc.

John Nguyen has a B.S. in Computer Engineering from Florida Institute of Technology, Melbourne, Florida. He is an application developer with 14 years of experience. His interests are computers, politics, and classical literature. He can be reached at john.nguyen@acm.org.