Web Hosting:
A Migrational Case Study
Ripduman Sohan
Hosting, the act of providing a service on behalf of an individual
or company, is a concept that has been around for as long as the
Internet. There are many types of hosting services, including Web,
mail, and database hosting. However, the most popular and longest-lived
hosting service has been Web site hosting.
Many organizations, such as universities, commercial companies,
and ISPs, provide this essential service for their users or customers.
Today, the Web is the most popular medium for retrieving information
from the Internet. To ensure your material in this information cornucopia
is readily available, it's essential to configure your end
to deal with anything your users may want, without inconveniencing
them or annoying you.
In this article, I present a case study migration of a system
containing 203 virtual hosts from one server to another, many of
which had backend databases. The Web server used was Apache, the
database, MySQL, all running on FreeBSD and being transferred to
Linux. I intend to show you how simply this can be done and share
some of the tricks and pitfalls generally involved with setting
up, running, and successfully migrating medium- to large-scale Web
sites with this software. I've also included virtual hosting
because that's what the original job entailed, and also because
I wanted to be as thorough as possible. Nevertheless, almost all
of the concepts in this article should be adaptable to single Web
sites and different software with little or no tweaking.
The Scenario
The source system was a box running FreeBSD 3 on a Pentium II
266 located in San Francisco. It was connected to the Internet via
a 256-KB link and was using Apache 1.3.1. Of the 203 virtual hosts,
30 required databases, so it also had MySQL 2.3 installed. The machine
setup was incompetent -- so incompetent that the actual MySQL
database was available directly off the Web. It also had no backups.
The target machine was a brand new, default installation, Redhat
6.3 machine on a T3 link in New York. I didn't have physical
access to either side of the scenario and was working off a satellite
link with a 700-ms lag.
The reason for the changeover was twofold. The company was increasingly
aware of the insecurity and lack of power of the source machine
in relation to their increasing customer base, and they were also
getting a better deal with a new co-location provider. My job was
to move the whole system, with zero downtime and no loss of client
data.
The Move
Backup
The move started with the most important thing -- a system
backup! I couldn't back up any of the user Web files or databases
due to the high load on the system. As soon as I touched any of
these, the system became highly erratic and with 2.3 GB of user
data on the system and no local means of backup, I wasn't going
to transfer all the data via the Internet link. Therefore, I initially
backed up just the httpd.conf file and the password and group
files. However, before you start a migration, I advise you to check
that latest system backup is valid. You do have one, right?
Analysis
The next move was to download and install analog. This is a very
well-known and comprehensive Web log analyzer, available as a package
for most platforms. You can get started quickly with the following
steps:
1. Install analog. Use your package manager, usually rpm -i
analog.rpm in Linux.
2. Edit the analog.cfg, usually in /etc/analog.cfg.
Edit the sections LOGFILE to point to where your Web server logfile
lives, and the section OUTFILE to point to your output filename.
3. Turn on the hourly report with the command FULLHOURLY ON
in the analog.cfg file.
4. Run the binary, usually /usr/bin/analog.
This will create a full hourly breakdown report using your log,
which you can view with any browser. I did this to build a time
profile so I would know the best time for me to actually log into
the machine in order to copy and move the data. Most Web servers
go through a daily cycle of use, depending on the time zone of their
audience, and it's best to work when load average is lowest
to minimize disruption to the system. If you can do the migration
with downtime or without affecting the service, go ahead and skip
this step.
New System Build
After figuring my optimal timings, I built the new server. If
you choose to run a dedicated machine as a server, a little forethought
in design can go a long way to prevent problems. Your most important
resources on a Web server are memory and disk space. Work on maximizing
those. Make your Web data partition as large as possible and put
it on a separate drive if necessary. Most default Linux server installations
come with setups that are not really optimal to Web servers --
do you really need X Windows? What about gimp? Get rid of all unnecessary
software. This usually frees up to 800 MB and makes software conflicts
less likely. On most RedHat-compatible Linux distributions, you
can use the following commands to work with installed packages:
rpm -qa -- Provides the full list of installed packages
rpm -qi packagename -- Information on a selected package
rpm -e packagename -- Removes the selected package
Next, trim memory usage. Disable (or replace with lighter equivalents)
all services that don't have to be running: atd, bind,
dhcp, and Sendmail (replaceable by ssmtp) are the
usual candidates. You can usually remove the packages or just remove
the startup scripts from your startup scripts directory. Also ensure
that you have adequate swap space (usually as much as available
ram). Swap space, at least in Linux, comes in two variants: partition
or file. Use the partition type -- at least if your swap space
gets corrupted your system won't.
I then upgraded the "key software". Key software is
software on which system functionality depends. This is usually
Apache and its related support software. It's worth using the
latest available stable version of your key software that you deem
fit for consumption. (My personal method of finding out the Apache
version to use is by querying http://www.slashdot.org by
using netcraft.) If you have any modules used by Apache (e.g., PHP),
it's worth getting the latest versions. Also, this is the time
to install any third-party software you want to use; Professional
FTP Daemon (Proftpd) and OpenSSH are popular options in this respect.
If you're using a system with a package manager and you don't
need any non-standard options (i.e., those requiring a source compile),
get the installable package. The reason for this is that any issues
with that particular OS and software will have been worked out by
the package maintainers, so it's worth using the package.
Performance Tuning
The crux of the matter is configuring your software so it performs
well. Although I could write several articles on how to configure
your system, I'll only give you the main ideas behind maximizing
performance for Apache and, to some extent, the system.
Apache is the machine's interface to the world. Configure
this poorly, and your beautiful new server with its oceans of ram
and your T3 link won't be worth anything.
To configure it well, first, get rid of all unnecessary Apache
modules and add any custom ones you do want. You can do this by
editing the httpd.conf file and looking for lines similar
to:
LoadModule info_module modules/mod_info.so
This tells Apache to load the module info_module into memory
when it starts. Go through each LoadModule line of the default installation
and disable all the modules you'll never need. (This is done
by putting a # sign at the front of the line.) Typical modules rarely
used on production Web servers are mod_autoindex (creates automatic
indexes for directories) and libproxy.so (proxy caching module).
You can find the complete module description for each standard module
in the Apache documentation. This is done to minimize the memory Apache
uses because fewer loaded modules mean less memory allocated to module
code. Sometimes, disabling modules can also lead to server speed increases.
If you're using Perl as a scripting language for the server,
consider having mod_perl loaded. This will eliminate having
to run an instance of Perl every time you start up a script, which
means better response time for the server.
Next, I always modify the lines StartServers, MinSpareServers,
and MaxSpareServers. These lines go together. To understand
this, remember that creating a process on most operating systems
is quite expensive in terms of time, and in Apache, each process
is known as a server. Hence, you'll want to start a reasonable
number of processes when you start up the program (StartServers
line), while simultaneously having a reasonable number of processes
free to serve any other incoming requests before it creates any
additional new ones (MinSpareServers line). Conversely, you
don't want to waste memory with spare processes lying about
after they've finished doing their work (MaxSpareServers
line). I find the values 8, 4, and 10 work well for most setups.
The next lines to modify are the lines MaxClients and MaxRequestsPerChild.
MaxClients relates to the maximum number of clients that
can connect to the server simultaneously. A larger number means
more concurrent connections, but worse performance; a smaller number
means the opposite. A good compromise is a value of 200. The line
MaxRequestsPerChild relates to the number of requests each
process can handle before it is forced to die. This prevents errant
processes (e.g., one that leaks memory) from hogging system resources.
If you're confident everything works well, you can set this
value to zero to provide that little extra boost in performance.
As a trick, you can use the above parameters to provide a limited
service while you perform maintenance or migration work. When I
migrated my data, I restarted Apache on the source machine with
a single server and a MaxClients of 50. This allowed users
to still get (some) service while I had a more usable machine.
You can also turn of hostname lookups (HostNameLookups
line). This prevents Apache from looking up and logging the DNS
record of the connecting client, as opposed to just the IP. Finally,
you should avoid providing server-side scripting (.shtml
files) because this forces Apache to parse each page it sends and
makes them uncacheable.
Regarding your system, there are two things that are consequential
to performance. The first is the maximum number of open files you
can have at any one time. In Linux, you can modify this parameter
by modifying the file /proc/sys/fs/file-max. The command
echo 16384>/proc/sys/fs/file-max will increase maximum
open files to 16384. For administration ease, you can put this command
in one of your startup scripts.
The second thing you can do is to rebuild the kernel, cutting
out all the unnecessary drivers. This frees memory and makes the
kernel leaner and, therefore, faster. However, if you're working
off a remote link, be certain the kernel works before deployment.
Having an exact replica locally, in terms of software and hardware,
may help here.
Data Migration
Data migration is relatively painless if carried out properly.
The first thing to do is to make sure all your user accounts have
been duplicated with the correct logins and passwords. If you're
migrating between homogeneous machines, just copy the relevant password
and shadow files across. Otherwise, you may have to manually migrate
accounts, but this is dependent on source and destination systems.
Most shadow systems now have interchangeable password files, but
check first. I was lucky that all the information my company had
for virtual hosting was assigned by them (including password), so
I made a text file with the user information in it, and use the
Linux newusers command to create all the new users.
Remember when migrating user accounts to make sure that all the
accounts on the new server have the same group identifier (GID)
and user identifier (UID) as on the old system. This prevents permission
and ownership problems when you copy the data across. It may also
be beneficial to create separate groups for different account classes.
For example, I have separate groups for the Web sites that connect
to databases, and for those that don't.
After the users are set up on the system, you need to create or
migrate the Apache virtual hosts for each virtual host. An easy
way of keeping your virtual host configuration separate is to put
it in a separate file (e.g., the directive Include conf/vhosts.conf
in your httpd.conf would allow additional configuration directives
(the virtual hosts) in the vhosts.conf). This makes a migration
easy -- just modify the file for your new configuration and
include it in your new setup.
You must ensure that all virtual hosts have their own transfer
and error log files. This is handy for the customers, because it
allows them to maintain and analyze their own log information. It's
handy for you, because it frees you of the same task.
After this, it's smooth sailing. All that's left is
archiving the data off the old server and restoring it on the new
one. There are several ways to do this. My favorite method requires
both machines to have OpenSSH installed. Then use the following
command, carried out in the data directory of the source server:
tar -cf - * |(ssh -l username destination.host.com tar -xvpf -)
This archives all the data on the host server and unarchives it at
the destination, all in one command. Nevertheless, if you feel so
inclined, go through the tar, copy, untar cycle. I recommend
you don't change the Web data directories when moving the files,
in case of any hard coded paths.
Databases should also be migrated at this point. With MySQL, the
process is easy. The basic steps are dumping the data to a text
file, copying the file to the new server, creating the database
on the new server, and importing the data into the database. A quick,
typical example is:
oldserver$ mysqldump dbname >outfile \
(dump the database dbname to file outfile)
oldserver$ scp outfile newserver:
(copy the outfile to the new server using secure copy)
newserver$ mysqladmin create dbname \
(create the database dbname on the new server)
newserver$ mysql dbname <outfile \
(import data from file outfile)
Testing
After migration, you're almost there -- but do things
work? The last thing you want is to change your DNS entries to point
to the new server, or create new DNS entries only to realize that
things don't work. However, you can't check to see whether
things work unless you move the DNS entries!
Fortunately, there are a number of solutions to this problem.
The most comprehensive one I use is the following:
1. Create a DNS server on an extra machine with fake records that
indicate the new server is the Web server for all the virtual hosts
you are hosting on it.
2. Find a set of machines on the same network as the fake DNS
server that will be used for testing. Point their primary DNS server
to this server.
3. Surf the virtual host sites to see whether they work.
This works because the fake DNS server is programmed to assume
that it is the master domain holder for the virtual hosts and, hence,
passes the wrong information to the client machines. The client
machines then contact the new server and request data from it. The
advantage of this is that a number of machines can simultaneously
be used to do testing.
A less elaborate but easier option is to have one machine on the
network configured to have no DNS server, but have a modified hosts
file (/etc/hosts for Linux) to point to the virtual hosts.
If your server machine needs to do some sort of host lookup, this
option is very useful, because it allows you to check that the whole
system works before migrating records.
Once you're satisfied that everything works, change the DNS
settings to reflect the new server. The job is now done. Keep the
old server active for some time after retiring it, because the DNS
propagation takes time (usually a few days, but sometimes up to
a month). You can then gradually recycle or destroy it.
Security
You can never be too safe on a production server. A few things
to check are:
- Don't give users access to the system if they don't
require it. Make their shells /bin/false.
- If you allow CGI scripts, vet them to ensure they aren't
malicious.
- If you use a custom Web-based administration tool, make sure
it's secure.
- If the main access to your system is via ftp, consider
using ProFTPD. This is an excellent ftp server with lots
of security features, including the ability to lock users into
their home directory, apply quotas, allow logins without a valid
shell, and control the maximum number of concurrent user logins.
- As much as possible, use OpenSSH to access your system. Disable
telnet and any other services you don't need.
- Periodically check and update your software for any discovered
security vulnerabilities.
- Use TCP Wrappers to control access to services.
- Periodically monitor your log files for any suspicious activity.
Conclusion
As you know, Web servers are integral but sensitive parts of today's
Internet. I hope this article has given you an insight to how important
it is to plan ahead when preparing or migrating a Web server. I
also tried to show how simple setup or migration can be, with little
or no downtime, if the planning is done well.
Links
Analog Weblog analyzer -- http://www.analog.cx
SSMTP (Send only Sendmail emulator) -- http://rpmfind.net/linux/RPM/contrib/libc6/i386////ssmtp-2.38-1.i386.html
Netcraft International -- http://www.netcraft.com
ProFTPD (Professional FTP Daemon) -- http://www.proftpd.net
OpenSSH -- http://www.openssh.com/
Ripduman Sohan is currently finishing off a degree in Software
Engineering at City University, London. He's originally, and
still, based in Kenya and has been using and promoting *nix based
systems since he was 14 years old.
|