Cover V10, I11

Article

nov2001.tar

Dynamic Round Robin

Jonathan D. Leghart

As more organizations become reliant on Web servers for day-to-day operations, systems administrators are faced with the task of ensuring that the company site is always available. Although there are several products that build clusters or actively balance a load across multiple machines, sometimes the expense or complexity can be prohibitive.

DNS Round Robin has been around for quite a while and is still widely used. The concept is straightforward -- for a single hostname, create multiple address records. The BIND server will then return a list of addresses to the requesting resolver in rotating succession. Using round robin has some advantages -- it is very easy to configure and inexpensive to implement. However, since BIND was not designed to actively monitor hosts, using round robin does not provide a true load-balancing solution. For example, if you have four Web servers configured in a round robin and one server goes down, the name server has no knowledge of the unavailable server and will continue to return that address. Thus, one fourth of your visitors will never make it to the site.

With the release of BIND 8, several new features were added including the ability to configure dynamic DNS zones. Dynamic DNS is often used in environments where clients use DHCP for network configuration. It allows the systems administrator to keep accurate tables without constantly assigning static IPs and updating zone files. Of course, if a systems administrator wants to update a zone, a simple utility called nsupdate can be used to add or remove records, including those for a round robin configuration.

Webwatch.pl is a simple Perl script that can watch over your Web servers and will add or remove servers from a dynamic DNS zone, depending on whether each server in the round robin configuration is available (Listing 1). The script is configured with a list of servers and groups (round robins). (Listings are available from the Sys Admin Web site: www.sysadminmag.com.) It reads each server in the list, checks whether it is currently a part of the round robin group, and then attempts to connect to port 80 on the server. From these two tests, there are four possible outcomes:

  • If the server is already in the round robin and a connection is made, nothing is done;
  • If the server is in the round robin but a connection cannot be made to port 80, the script will remove it from the round robin;
  • If the server was not in the round robin but a connection can be made, it will be added back;
  • If the server is not in the round robin and no connection is established, no DNS changes will be made.
Configuring the Zone

The first step to create a dynamic round robin is to configure a zone that allows updates. Although it is easy to make a zone dynamic, I don't like to make my entire domain dynamic. Instead, I configure a subdomain specifically for dynamic, round robin configurations. In my root zone file, I can then create CNAME records to point to the round robin in my dynamic zone. The configuration would look something like this:

In etc/named.conf:

        zone "foo.bar" IN {
        type master;
        file "db.foo.bar";
    };

        zone "rr.foo.bar" IN {
        type master;
        file "dynamic/db.rr.foo.bar";
        allow-update { 192.168.1.5; };
    };

In the db.foo.bar zone file:

    www        CNAME        www.rr.foo.bar
Now a read-only zone file for rr.foo.bar will need to be created. The zone file need only contain a base set of information, including a default TTL, SOA information, NS, and MX records. It should look something like this:

$TTL  86400
@     IN    SOA    rr.foo.bar.    hostmaster.foo.bar. (
            2001070100    ; Serial
            10800        ; Refresh
            3600         ; Retry
            604800    ; Expire
            86400)    ; Min TTL

            NS    ns1.foo.bar.
            NS    ns2.foo.bar.
            MX    mail.foo.bar.
Since the script will be adding and removing hosts, the initial file does not need to contain any host information. You will, however, need to be sure that this zone file has write permissions by the UID used to run the script.

Script Configuration

After the zone is set up, the script will need some minor configuration changes. The first section of the script defines all the local values for each installation. The value for $domain will be the newly created dynamic domain (rr.foo.bar, in this case). The $ttl value is the timeout value for positive responses from the BIND server. (It is important to understand that with BIND 8, the named server recognizes different TTL values for positive and negative responses for a zone or even a particular host.) For any installation, this value should not exceed the amount of time between Web server checks; otherwise, a caching name server may keep an address cached even if the script has removed it from the round robin.

The next value, $timeout, is the number of seconds the script will wait for a connection with a single Web server before assuming the system is unavailable. This number, multiplied by the total number of hosts the script will be checking, should not exceed the time interval between checks.

The next two values, $logfile and $nsupdate, are self-explanatory. Note, however, that you can run this script as any user because the named server only cares where a DNS update is coming from, not who is making it. However, you will need to be sure that the logfile is writable by the UID executing the script.

The next value is a hash that defines your servers and the round robin set to which they belong. For each IP address you want the system to check, you must associate a hostname. For example, say you have www.domain.com for your primary content, and images.domain.com -- a set of servers for those bandwidth-hogging pictures. Here you can define all the IPs of those servers and their appropriate group (www or images). If you have several groups of servers, you may want to consider running multiple scripts. Remember that the interval between script runs should be greater than the number of hosts you are checking in one script multiplied by the $timeout value.

The Script at Work

The rest of the script is straightforward. To make troubleshooting and customization easy, each major function was broken out:

CheckConfig -- Ensures all the values in the initial configuration section are useable. This includes ensuring the log file is writable and that nsupdate is executable.

CheckDNS -- Determines whether a server's IP is a part of a round robin set. It uses the standard Perl function gethostbyname to get the list of IPs for the round robin host.

CheckHTTP -- Attempts to create a TCP connection to a server using port 80. It will wait for the value specified in $timeout to complete the connection, otherwise it will return unsuccessfully. For implementations that require more than just a connection to the IP (i.e., monitoring virtual servers), this section could be modified to actually request data from the HTTP server and perform some sort of validation to determine whether to return success or failure.

ModDNS -- The routine that interacts with the nsupdate command. It simply reads all the parameters passed into the function and feeds them into nsupdate.

Logger routine -- Used to create entries in the webwatch.log file. For those wanting to log to syslog, this routine could be modified. It could also be changed to only log negative results. For the truly dedicated, you can even set this up to send a pager message any time a system is dropped from the round robin.

TimeStamp -- A simple routine that will format the time for the Logger routine.

Running the Script

Now that the script is set up and your dynamic zone is ready to go, it's time to run the script. A simple entry in cron will take care of that, however, don't forget the magic formula -- the interval between script runs should be greater than the total number of hosts being checked multiplied by the timeout value.

Assuming the system running the script can see all of your servers, your dynamic zone should start to populate. Running nslookup will allow you to see whether the entries are showing up. Once you have confirmed that the script is running, there are a few maintenance tasks that you will need to perform. Obviously, you will need to check the log file often for errors. You should also rotate it, so it doesn't get unmanageable. As with any important service, you will want to periodically check to be sure the script hasn't unexpectedly died.

Some Final Notes on Implementation

If you decide to use this solution to manage your systems, there are a few things to consider when designing your network. Be careful that your monitoring system won't be on a network that may lose connectivity with the Web servers, yet remain connected to the DNS server. The result would remove all of your Web servers from the round robin, even though they may still be available to the rest of the net.

Another point to consider is what may happen if a particular server becomes overloaded. I recently suggested using this script to a client who relied on round robin to balance the company's Web load (more than two million visitors a day) over several servers located across the United States. The client made the point that sometimes servers just get overloaded, which would cause timeouts and result in a server that is up (but very busy) to be removed from the round robin. While true, you could also argue that removing the server from the round robin would help reduce the load. Then, when the server became less loaded, it would automatically be put back in to the round robin. In either case, it's important to be sure you have some other mechanism in place to monitor the health of your systems.

Finally, although I "home-grow" many solutions to make day-to-day administration easier (not to mention keeping my pager quiet at night), I often see situations where a few dollars spent would have prevented hours of frustration and downtime. If your installation requires a robust product, and your company can afford it, spend the time and effort to research the right solution for you.

Resources

BIND source and documentation is available at: http://www.isc.org/products/BIND

Albitz, Paul, and Cricket Liu. 2001. DNS and BIND, 4th Edition. O'Reilly & Associates.

Wall, Larry, and Randal L. Schwartz. 2000. Programming Perl, 3rd Edition. O'Reilly & Associates.

Jonathan Leghart has been messing with computers since learning to program BASIC in the fourth grade. For the past five years he has focused primarily on UNIX and network administration with a particular interest in writing Perl scripts. He currently works as a Network Systems Engineer for Lucent Worldwide Services, providing consulting services to enterprise customers. Jonathan can be contacted at: jonathan@leghart.org.