Dynamic
Round Robin
Jonathan D. Leghart
As more organizations become reliant on Web servers for day-to-day
operations, systems administrators are faced with the task of ensuring
that the company site is always available. Although there are several
products that build clusters or actively balance a load across multiple
machines, sometimes the expense or complexity can be prohibitive.
DNS Round Robin has been around for quite a while and is still
widely used. The concept is straightforward -- for a single
hostname, create multiple address records. The BIND server will
then return a list of addresses to the requesting resolver in rotating
succession. Using round robin has some advantages -- it is very
easy to configure and inexpensive to implement. However, since BIND
was not designed to actively monitor hosts, using round robin does
not provide a true load-balancing solution. For example, if you
have four Web servers configured in a round robin and one server
goes down, the name server has no knowledge of the unavailable server
and will continue to return that address. Thus, one fourth of your
visitors will never make it to the site.
With the release of BIND 8, several new features were added including
the ability to configure dynamic DNS zones. Dynamic DNS is often
used in environments where clients use DHCP for network configuration.
It allows the systems administrator to keep accurate tables without
constantly assigning static IPs and updating zone files. Of course,
if a systems administrator wants to update a zone, a simple utility
called nsupdate can be used to add or remove records, including
those for a round robin configuration.
Webwatch.pl is a simple Perl script that can watch over your Web
servers and will add or remove servers from a dynamic DNS zone,
depending on whether each server in the round robin configuration
is available (Listing 1). The script is configured with a list of
servers and groups (round robins). (Listings are available from
the Sys Admin Web site: www.sysadminmag.com.) It reads
each server in the list, checks whether it is currently a part of
the round robin group, and then attempts to connect to port 80 on
the server. From these two tests, there are four possible outcomes:
- If the server is already in the round robin and a connection
is made, nothing is done;
- If the server is in the round robin but a connection cannot
be made to port 80, the script will remove it from the round robin;
- If the server was not in the round robin but a connection can
be made, it will be added back;
- If the server is not in the round robin and no connection is
established, no DNS changes will be made.
Configuring the Zone
The first step to create a dynamic round robin is to configure
a zone that allows updates. Although it is easy to make a zone dynamic,
I don't like to make my entire domain dynamic. Instead, I configure
a subdomain specifically for dynamic, round robin configurations.
In my root zone file, I can then create CNAME records to point to
the round robin in my dynamic zone. The configuration would look
something like this:
In etc/named.conf:
zone "foo.bar" IN {
type master;
file "db.foo.bar";
};
zone "rr.foo.bar" IN {
type master;
file "dynamic/db.rr.foo.bar";
allow-update { 192.168.1.5; };
};
In the db.foo.bar zone file:
www CNAME www.rr.foo.bar
Now a read-only zone file for rr.foo.bar will need to be created.
The zone file need only contain a base set of information, including
a default TTL, SOA information, NS, and MX records. It should look
something like this:
$TTL 86400
@ IN SOA rr.foo.bar. hostmaster.foo.bar. (
2001070100 ; Serial
10800 ; Refresh
3600 ; Retry
604800 ; Expire
86400) ; Min TTL
NS ns1.foo.bar.
NS ns2.foo.bar.
MX mail.foo.bar.
Since the script will be adding and removing hosts, the initial file
does not need to contain any host information. You will, however,
need to be sure that this zone file has write permissions by the UID
used to run the script.
Script Configuration
After the zone is set up, the script will need some minor configuration
changes. The first section of the script defines all the local values
for each installation. The value for $domain will be the
newly created dynamic domain (rr.foo.bar, in this case).
The $ttl value is the timeout value for positive responses
from the BIND server. (It is important to understand that with BIND
8, the named server recognizes different TTL values for positive
and negative responses for a zone or even a particular host.) For
any installation, this value should not exceed the amount of time
between Web server checks; otherwise, a caching name server may
keep an address cached even if the script has removed it from the
round robin.
The next value, $timeout, is the number of seconds the
script will wait for a connection with a single Web server before
assuming the system is unavailable. This number, multiplied by the
total number of hosts the script will be checking, should not exceed
the time interval between checks.
The next two values, $logfile and $nsupdate, are
self-explanatory. Note, however, that you can run this script as
any user because the named server only cares where a DNS update
is coming from, not who is making it. However, you will need to
be sure that the logfile is writable by the UID executing the script.
The next value is a hash that defines your servers and the round
robin set to which they belong. For each IP address you want the
system to check, you must associate a hostname. For example, say
you have www.domain.com for your primary content, and images.domain.com
-- a set of servers for those bandwidth-hogging pictures. Here
you can define all the IPs of those servers and their appropriate
group (www or images). If you have several groups of servers, you
may want to consider running multiple scripts. Remember that the
interval between script runs should be greater than the number of
hosts you are checking in one script multiplied by the $timeout
value.
The Script at Work
The rest of the script is straightforward. To make troubleshooting
and customization easy, each major function was broken out:
CheckConfig -- Ensures all the values in the initial configuration
section are useable. This includes ensuring the log file is writable
and that nsupdate is executable.
CheckDNS -- Determines whether a server's IP is a part
of a round robin set. It uses the standard Perl function gethostbyname
to get the list of IPs for the round robin host.
CheckHTTP -- Attempts to create a TCP connection to a server
using port 80. It will wait for the value specified in $timeout
to complete the connection, otherwise it will return unsuccessfully.
For implementations that require more than just a connection to
the IP (i.e., monitoring virtual servers), this section could be
modified to actually request data from the HTTP server and perform
some sort of validation to determine whether to return success or
failure.
ModDNS -- The routine that interacts with the nsupdate
command. It simply reads all the parameters passed into the function
and feeds them into nsupdate.
Logger routine -- Used to create entries in the webwatch.log
file. For those wanting to log to syslog, this routine could be
modified. It could also be changed to only log negative results.
For the truly dedicated, you can even set this up to send a pager
message any time a system is dropped from the round robin.
TimeStamp -- A simple routine that will format the time for
the Logger routine.
Running the Script
Now that the script is set up and your dynamic zone is ready to
go, it's time to run the script. A simple entry in cron will
take care of that, however, don't forget the magic formula
-- the interval between script runs should be greater than the
total number of hosts being checked multiplied by the timeout value.
Assuming the system running the script can see all of your servers,
your dynamic zone should start to populate. Running nslookup will
allow you to see whether the entries are showing up. Once you have
confirmed that the script is running, there are a few maintenance
tasks that you will need to perform. Obviously, you will need to
check the log file often for errors. You should also rotate it,
so it doesn't get unmanageable. As with any important service,
you will want to periodically check to be sure the script hasn't
unexpectedly died.
Some Final Notes on Implementation
If you decide to use this solution to manage your systems, there
are a few things to consider when designing your network. Be careful
that your monitoring system won't be on a network that may
lose connectivity with the Web servers, yet remain connected to
the DNS server. The result would remove all of your Web servers
from the round robin, even though they may still be available to
the rest of the net.
Another point to consider is what may happen if a particular server
becomes overloaded. I recently suggested using this script to a
client who relied on round robin to balance the company's Web
load (more than two million visitors a day) over several servers
located across the United States. The client made the point that
sometimes servers just get overloaded, which would cause timeouts
and result in a server that is up (but very busy) to be removed
from the round robin. While true, you could also argue that removing
the server from the round robin would help reduce the load. Then,
when the server became less loaded, it would automatically be put
back in to the round robin. In either case, it's important
to be sure you have some other mechanism in place to monitor the
health of your systems.
Finally, although I "home-grow" many solutions to make
day-to-day administration easier (not to mention keeping my pager
quiet at night), I often see situations where a few dollars spent
would have prevented hours of frustration and downtime. If your
installation requires a robust product, and your company can afford
it, spend the time and effort to research the right solution for
you.
Resources
BIND source and documentation is available at: http://www.isc.org/products/BIND
Albitz, Paul, and Cricket Liu. 2001. DNS and BIND, 4th
Edition. O'Reilly & Associates.
Wall, Larry, and Randal L. Schwartz. 2000. Programming Perl,
3rd Edition. O'Reilly & Associates.
Jonathan Leghart has been messing with computers since learning
to program BASIC in the fourth grade. For the past five years he
has focused primarily on UNIX and network administration with a
particular interest in writing Perl scripts. He currently works
as a Network Systems Engineer for Lucent Worldwide Services, providing
consulting services to enterprise customers. Jonathan can be contacted
at: jonathan@leghart.org.
|