Secure in reading email at our workstations and happy
with our ability
to squirt Internet mail through our IPX-to-IP gateway,
our organization
didn't put up a Web server for quite some time. But
time marched on, the
Olympic Games drew near, and the powers-that-be wanted
to put
descriptions of our Olympic training facilities on the
Web - Savannah
hosted the 1996 Olympic Yachting events, you know.
At first, we seriously considered Windows NT. It seemed
easy to manage,
robust, and it had the requisite "freeware"
http (Hyper Text Transfer
Protocol) service available (included with the NT resource
kit and also
available on the Internet). However, at the time, it
seemed that Perl
support for NT was not all there. Perl tends to be the
language of
choice for writing Common Gateway Interface (CGI) scripts,
so this was
of major concern. We also were not thrilled with NT's
inclination to ask
for a reboot every time you change a networking parameter.
We were already running production-level Linux, and
we didn't have any
production NT. Still, it was a toss-up between Linux
and NT until we saw
our budget for implementing this project: $0. Naturally,
Linux once
again provided our solution.
We implemented our Linux-based Web server starting with
a pilot server,
then moved the operation to a "cloned" Linux
box when we were satisfied
with the results. We have now been Webmeisters for more
than 6 months
and are very happy with how it has worked both in pilot
and in
production. Not only does this system serve the Internet
at large, but
it also serves "home base" documents for our
public library system's
"public Internet" terminals.
So what does the term "Web server" mean? From
a Unix/Linux standpoint, a
Web server is a daemon that:
Forks and fulfills the request, either by spewing
forth a binary or
text stream on a negotiated socket (Figure 1), or by
running an external
program with supplied parameters.
So, "Web" servers are really http servers.
They listen for http
requests, then dish out the requested hypertext on the
negotiated socket
to the connected client. The hypertext usually includes
HTML (Hyper Text
Markup Language), GIF pictures, and so on. The server
is not limited in
what it can serve on a one-to-one basis. You are only
limited by what
the client can handle. You could serve compressed cpio
dumps if you
wanted to - but the browser client would need some sort
of helper
application to read/list/extract those dumps. Otherwise,
the browser
would not know how to handle the data, and the user's
only choice would
be to download the dump.
Note that the http world doesn't use Unix-style "magic"
numbers for data
typing. Instead, it relies on the perhaps more explicit
DOS-world
convention of using a dotted file extension to denote
data type. For
example, HTML files are usually file.html or file.htm,
and predictably,
GIF style pictures are file.gif.
There is one wrinkle in this mostly simple service:
most, if not all,
http servers or daemons support the Common Gateway Interface.
Basically,
CGI, in conjunction with the http stream (usually via
HTML forms), will
allow the end-user to run a program on your box (scary,
huh?) in a
captive, noninteractive mode.
Specifically, CGI parameters are supplied from the browser,
via http, to
your daemon. The parameters are then either placed in
an environment
variable or provided as standard input to the CGI script
in question,
which is run as a child process of the httpd (again,
see Figure 1). The
program name is also provided by the browser (which
it can get from an
HTML form). The http daemon ensures that the program
exists in its CGI
directory and then runs it. Standard output of the CGI
program will be
piped back to the browser client, again, via your http
server and the
negotiated socket. Usually, the output of CGI programs
are HTML, though
this is not mandatory.
Clearly, you want your CGI programs to behave somewhat
securely. Running
a shell against a user-provided string, is, for example,
a bad idea.
Allowing users to arbitrarily populate your CGI directory
with random
programs is probably not a good idea either. But no
matter how many
scare stories you may hear (and you will), you will
probably find
yourself CGI-ing at one point or another. If you end
up programming CGI
scripts, you probably will want to check out David Ray's
very good CGI
security article in WEBsmith (#2, March 1996).
Implementation
Our first task was to choose an http daemon to use,
and I admit, at
first our choice was somewhat random. At the time, we
could have picked
CERN's daemon or NCSA's. We chose NCSA's simply because
we had no
trouble connecting to their server and found a Linux
binary distribution
of the NCSA daemon there. In those days, we figured
that there was
enough adventure and excitement in configuring the daemon
without having
to also port and compile it. We got the Slackware-style
httpd
distribution from:
ftp://ftp.cc.gatech.edu
/pub/Linux/distributions/Slackware/contrib/httpd.tgz
The Slackware Linux distribution includes the pkgtool
utility. This
simplifies management of .tgz (tarred and gzipped) archives
by treating
them as packages - addable and removable as discrete
units. Using the
pkgtool, we added the httpd.tgz package to our system.
After
installation, we reviewed the logfile in /usr/adm/packages/httpd,
and
saw that the root path for the httpd was in /var/httpd;
the
configuration files were in the ./conf directory. Other
distributions
use /usr/local/etc/httpd; the subdirectories are the
same: ./conf,
./logs, and /cgi-bin.
The documentation included with the distribution was
sparse.
Fortunately, you can find all sorts of terrific and
up-to-date
documentation online at:
http://hoohoo.ncsa.uiuc.edu
If you're going to be doing a lot of text documentation
browsing from
your Linux box (and you will), it is a really good idea
to get the Lynx
text-only browser. It is fast, convenient, and not at
all fattening (to
your filesystem, that is). Lynx helped us a lot while
we were testing
our own server and its pages, too! You can get a description
and a
chance to download it for free from:
http://www.ukans.edu
There are two ways to configure how the http server
listens for
requests: standalone (running all of the time) or slave,
under the
control of inetd. The documentation definitely favors
the standalone
method, and we never like to argue with the folks who
write the
software, so we left it running standalone. So, no modification
of the
inetd.conf file was necessary. (Figure 1 is an example
of a standalone
configuration.)
The httpd comes with configuration files that are renamed
from
filename.conf to: filename.conf-dist.
This ensures that you can't simply unpack the distribution
and shoot
yourself in the foot without at least taking the trouble
to rename the
files. There are three config files in ./conf of immediate
interest:
httpd.conf (the main configuration file), srm.conf (server
resource
management), and access.conf (security). The syntax
in all these files
is:
ITEM VALUE [VALUE]
There are a lot of ITEMs to configure, and you will
definitely want to
spend a little bit of time browsing the abovementioned
hypertext
documentation.
For our purposes, we were able to leave most items alone.
We configured
site-specific items, such as the ServerAdmin item in
the httpd.conf.
(See Figure 2 for pertinent excerpts.) Again, you may
want to check out
http://hoohoo.ncsa.uiuc.edu to gain further understanding
of the many
specialized configuration options.
For example, there are options that allow your server
to offer several
different sets of "home" (i.e., index.html)
pages depending on which
network interface and "hostname" the request
comes from. In other words,
you could serve a different initial default page for
your Ethernet
interface (http://jon.dog.com) versus your Token Ring
or even "dummy"
interface (http://leo.meow.com). See the VirtualHost
option if you're
interested in doing this. You can also allow users to
create their own
pages, and so forth. But if your purpose, like ours,
is to simply serve
one set of documents to the world at large, NCSA has
done most of the
configuration for you!
Since we had now configured the httpd to be a standalone
daemon, we had
to invoke it manually. We tested it by typing:
# /usr/sbin/httpd
Oops! It complained that it didn't know about group
ID -1, which it runs
as for security reasons. We fixed that by editing /etc/group
to include
a group with ID -1:
nogroup::-1:
We ran the httpd again, and this time it didn't complain.
We used the
Lynx browser to browser:
http://localhost
and found that, yes indeed, it worked! Since we had
no index.html file
in the htdocs directory, all we saw was a directory
listing of an empty
directory - not very exciting. But, after we populated
the htdocs
directory with a simple index.html file, we were on
our way to some
serious Web-slinging!
Meanwhile, our public librarians had been busy writing
up local
reference pages in HTML for our public access terminals.
The librarians
ftp'd the pages over - and we promptly hit a snag. Their
DOS-based
systems had only allowed them to use a .HTM extension.
When the httpd
did not see a .HTML extension, it served the page as
plain text. Ugly!
We had two choices (because renaming all of those files
and editing the
links was not attractive): change the default data type
to be HTML or
add a new MIME type with a file extension of .HTM. We
chose to be
explicit rather than broad, and added a new MIME type
in the srm.conf
file (Figure 1), with the line:
AddType text/html htm
Now that things were working, we ensured that the daemon
would autostart
on bootup, and we edited /etc/rc.d/rc.inet2 to include
the httpd:
IN_SERV="lpd httpd"
We ran for a while, being careful to advertise the server
as www, a DNS
alias to the outside world that we had set up, instead
of using the
machine's canonical name. After our test period, we
rescued a PC from
the salvage box, put 8 Mb of RAM into it with a 200
Mb hard drive, and
cloned the first machine to it over the network. (See
"Using Linux as a
Router," Sys Admin, January 1996 for details on
cloning Linux over a
network.) We initiated the new machine as our permanent
production
server. All we did was change the DNS alias to point
to the new box, and
the outside world never noticed the difference.
Security Rears Its ELFin Head
And at this point, our story should have ended, but
a CERT (Computer
Emergency Response Team) advisory, CA-95:04, alerted
us to a serious
security problem with the version 1.3 httpd that we
were running. So, it
was time to upgrade to version 1.5a (the latest and
greatest at the time
of this writing). We looked to the same site that had
provided us with
the Slackware httpd package in the first place - it
was still at 1.3!
Fortunately, NCSA itself provides Linux binaries at:
ftp://ftp.ncsa.uiuc.edu
Not so fortunately, NCSA also links its binaries using
the ELF, not
a.out, format. This meant that we couldn't use them
on our systems until
we upgraded our kernel and shared libraries. For the
short term, this
wasn't acceptable. It was time to bite the bullet and
compile the
daemon.
We downloaded the sources from the above NCSA ftp site.
The compile was
a very tame adventure - all we had to do was to unpack
the tar file,
change into the resulting directory, and type:
make linux
The sources compiled cleanly on our system, which was
a relief. If you
don't feel like compiling your own a.out sources, you
can get our
resulting a.out binary distribution from:
ftp://ftp.co.chatham.ga.us/pub
We replaced the old httpd binary with the new, then
monitored the
./logs/error_log carefully for a while. So far, everything
looks peachy.
One of the neater things about this project was, as
a busy MIS
department, we turned the content of the Web pages over
to our very
technically competent Public Library system. After all,
they are the
public information specialists! It has been a very rewarding
symbiotic
relationship. We get to work with operating systems
and other
techno-geek stuff, and they have done an incredible
job organizing a set
of local pages that reflects local interest, and guides
the beginning
user to the Web in a very easy and friendly way!
About the Author
Jonathan Feldman works with Unix and NetWare at the
Chatham County
Government in Savannah, Georgia. He likes to keep things
simple so that
even he can understand them. His son Leo has just convinced
him that
"Go, Dog, Go!" is a way of life, and that
with copious quantities of
Mommy and chocolate milk, anything is possible. He
is reachable via
email at jonathan@co.chatham.ga.us.