The CGI is the simplest and by far the most common way of providing
Web pages with dynamic content. Essentially, the CGI (Common Gateway
Interface) is a way for the Web server to invoke a program to generate
HTML that gets sent back to the Web browser, rather than simply
serving up a static HTML file. Without the CGI and other similar
dynamic content schemes, many things would be impossible on the
Web -- stock trading and booking of vacations, for example,
and just about anything requiring input from users. The Web would
still be simply a mechanism for downloading static documents. Figure
1 shows how CGI scripts fit into the picture.
These programs invoked by the Web server are called CGI scripts.
The name of the program is sent by the Web browser in the URL, followed
by arguments to the CGI script. The Web server sets up the CGI script's
environment so that it can access the arguments, then starts the
CGI script. The CGI script then runs, does whatever the programmer
coded, and writes its output to stdout. The Web server redirects
stdout back to the Web browser that sent the request.
With static HTML, the Web server simply sends the requested HTML
file back to the user's Web browser, which then interprets
the HTML, formats it, and displays it. Take this URL for example:
A CGI script can be written in almost any programming language
that will not get in the way of checking the values of environment
variables, reading from stdin and writing to stdout.
The majority are written in Perl, but they can also be written in
C or the shell scripting language of your choice.
A CGI script can, intentionally or otherwise, do anything that
the user it runs as can do. Typically, CGI scripts run as the same
user as the Web server. On most UNIX systems, the Apache Web server
is used and by default, Apache runs as user "nobody".
By convention, "nobody" is a user for unprivileged operations.
Some may think that something running as nobody could not do much
to compromise a Web server, but there are many ways security can
be compromised.
There are many files sprinkled around the typical UNIX system,
which are readable by all users, but which you probably don't
want in the public domain. A prime example is /etc/passwd.
This file contains a list of all users on your system, and if you
are not using a shadow password file, also contains the encrypted
forms of all of your users' passwords. If a hostile party can
manage to get a copy of /etc/passwd, you are wide open to
a password guessing attack, even if you use a shadow password file.
If you don't use a shadow password file, you will be easy prey
for a dictionary attack, whereby a program encrypts a long list
of words and compares them against encrypted passwords.
It is easy to write a CGI script that is vulnerable to malicious
query string contents, which can make a CGI script do things it
was never intended to do (e.g., sending a file on the server back
to a hacker's Web browser). A classic CGI security problem
occurs when a CGI script starts a shell and passes it data from
the query string without carefully checking the query string contents.
Listing 1 contains a CGI script that appears relatively innocuous
-- it provides a way for somebody to run a whois query
from his or her Web browser. For example, this HTTP request will
return information about IP address 207.46.130.45:
Of course, there are plenty of other things that this outwardly
innocent-looking CGI script could be coaxed into doing -- like
emailing the password file or any other file readable by the Web
server to a mailing list.
CGI scripts should not start a subshell if there is a way around
it. In Perl, a subshell is started in any of the following ways:
1. Using the backtick operator, as demonstrated above
2. Opening up a pipe to a program. For example, the code fragment
below would cause the process /bin/lpr (a process that submits
print requests), and would cause anything that the Perl script writes
to SPOOLER to be redirected into the standard input of /bin/lpr:
1. Using system ()
2. Using popen ()
1. Parse all input. Determine which set of characters is valid
for the particular type of input token you are expecting, and allow
ONLY those characters. Either remove or escape other characters.
It is not as simple as scanning a piece of data for shell metacharacters
and rejecting anything that contains a metacharacter, for two reasons:
Some characters that are shell metacharacters may be valid
in some positions in the input. For example, ";"
is a valid character to appear in a file name. But ";"
is also a command separator. Suppose a CGI script is displaying
the contents of a file on behalf of a user, with the file name
coming from the user. The script has a line of code like this:
'/bin/cat $filename'
$filename is the name that came from the user. As we have
previously seen, a hacker could insert a ";" into
the file name with a command after it, thereby causing the shell to
execute the command. The hacker could send in a "file name"
that looks like this:
myfile;ls
This would cause the file myfile to be displayed, followed
by a directory listing. Suppose that you must allow ";"
to appear in the file name, which is a valid character in the file
name. Before it is passed to the shell, the script should change $filename
to this:
myfile\;ls
This would cause the file myfile;ls, if there is such a file,
to be displayed. Be careful -- the hacker may know that you are
escaping metacharacters, and he may have already sent in a file name
that looks like this:
myfile\;ls
If the CGI script simply sticks another "\" before
the ";", then what gets passed to the shell is:
myfile\\;ls
This would cause the file myfile\ (if it exists) to be displayed,
followed by the directory listing that you didn't want the hacker
to see.
As you can see, there are all sorts of games that can be played
with metacharacters, and having anything other than a blanket ban
on metacharacters in input can be tricky and hard to test. It is
a matter of balancing flexibility versus security. Even when a CGI
script is checking its input, it is important to remember that all
programs have bugs and the more complex a program is, the more bugs
it is likely to have. Complex logic to check input is more likely
to have bugs than simple logic to check input, and exploiting bugs
is what hackers do.
This is a list of metacharacters for various shells:
;<>*|'&$!#()[]{}:'"/^\n\r
If any of these are in data passed to subshells, make sure that they
are properly escaped and make sure that all possibilities are tested.
2. Specify the absolute filename of any commands, so that the
PATH environment variable will not be used to find the command.
Also ensure that the PATH environment variable is set to a known
value. It should contain only directories that are writable solely
by the owner of the directory. The reason it is important to set
PATH to a known, good value, even if your CGI script does not use
PATH to find commands, is that your script might start a command
that relies on PATH.
The risk in relying on PATH to find commands is that a hacker
could have modified PATH to include a particular directory. That
directory could contain a malicious script placed there by the hacker.
If a command that your script starts relies on PATH, the danger
is mitigated by allowing only directories that are solely writable
by their owners. That will reduce the risk of executing a malicious
script.
If CGI scripts are coded in Perl, a further measure that can (and
should) be taken is to turn on "taint" checking. Taint
checking is a feature of Perl that forces a program to check untrusted
input and environment variables. To turn on taint checking in Perl
5, change the CGI script to add the -T option to the invocation
of Perl, as shown below:
#!/usr/bin/perl -T
If only this change and no other changes are made to the CGI script
shown in Listing 1, the script shown dies with the following message:
Insecure dependency in '' while running with -T switch.
This is because the scalar $parm is considered "tainted".
The principle of taint checking is that all data from outside
the program, or derived from data outside the program, must be "laundered"
before the data is used in such a way that it could affect something
outside your program. Until data is laundered, it is considered
tainted. Attempting to use tainted data in any command that invokes
a shell, or in any command that modifies files, directories, or
processes, will cause the program to die.
To launder tainted data, the program must perform a regular expression
match. It must then derive the new value of the data from subpattern
variables set by the regular expression match. Let's attempt
to untaint the data in our example CGI script. After splitting $key
and $val, insert the code from Listing 2, which will guarantee
that $val has only certain characters.
When a hacker attempts to add an additional shell command onto
the end of the whois request, the script will detect it and
terminate. Remember that although taint checking makes you check
your input, it doesn't enforce the quality of the checking.
The quality of the checking is up to you.
Now let's attempt to run this script with good input and
see what happens. This time, it dies with another error:
Insecure $ENV{'PATH'} while running with -T switch
The problem this time is that our script is still running with the
PATH environment variable that it inherited from the Web server, which
has an unknown value and is considered tainted. As previously mentioned,
not setting PATH to a known value, which contains only directories
non-writable by anyone except the owner, is a bad practice. Listing
3 shows the fixed script. This version of the script does not pass
extra commands to the shell, and will only execute the intended programs.
Imagine what could have happened if the Web server had been running
as root. The CGI script we started with could have been hijacked
to do anything, such as emailing the shadow password file to somebody,
or trashing an entire file system. Never run the Web server as root.
It is usually necessary to start the Web server as root so that
it can open the HTTP port, but it should be configured to change
to another user, such as nobody, after it is finished with initialization.
In fact, it is not a bad idea to set up a user specifically for
running the Web server, because there are often other services that
run as nobody.
Other Problems with Bad Input Data
Suppose you have a form with three radio buttons on it. At the
bottom of the form there is a "submit" button that, when
clicked, causes a CGI script to run. These radio buttons select
a text file on the server, which the CGI script will write to the
user's browser.
The three possible files that can be selected by this set of radio
buttons are "file1", "file2", and "file3".
Here is a possible implementation of the CGI code:
# Since this CGI script is outputting plain text, not HTML, tell
# the browser to expect plain text.
print "Content-type: text/plain\n\n";
# Write the contents of $radiobutton to stdout. Value should be
# file1, file2 or file3 since input could only have come from our
# form. There's no need to check it -- since all our users are
# nice people :-)
open (FD, $radiobutton);
print <FD>;
The problem here is that perhaps the form was not used to send the
form input. It is trivial for a hacker to display the source HTML
for a form and determine what variables the CGI script is expecting
in the form data. From there it is not too much more work for the
hacker to manually generate an HTTP request using telnet or
a simple HTTP client of the hacker's own creation to send a request
to the server containing bad form data. A hacker might have guessed
(or known, if you are using a public domain CGI script) that the script
you are running to handle the form input has this vulnerability. Instead,
this piece of code should do something like this:
if ($radiobutton =~ m/^(file1|file2|file3)$/)
{
# Tell the browser to expect plain text.
print "Content-type: text/plain\n\n";
open (FD, $1);
print <FD>;
}
else
{
# Either set $radiobutton to some default value and process or
# die with error.
}
CGI scripts also need to take precautions with plain text input. Consider
a system in which users can enter plain text data into a form. Suppose
there is a CGI script that handles the form input and saves it in
a database verbatim, and another CGI script that retrieves this "plain
text" from the database and displays it. The retrieval CGI script
might have a section of code that looks like this:
print "<html><title>Here is the text you entered</title><body>";
print "$userdata\n";
print "</body></html>";
If $userdata is something like "Hi Fred", then there
is no problem. But suppose that when the form data was saved in the
database, it contained something like:
<!--#include file="/etc/passwd" -->
If server side includes were turned on in the server, it would display
the contents of the password file.
There are all kinds of nasty variations on this theme. Something
like the following could have been inserted to execute a command
to attempt to delete all files on the server:
<!--#exec cmd="cd /; rm -rf" -->
A hacker could even insert HTML designed to blend into the Web site
being attacked, complete with a link to a rogue Web site where users
might be prompted to enter credit card data for the hacker to steal.
To fix this, the CGI script that handles the input should check
for "<" and ">" characters
in text input that could be used in HTML documents and change those
characters to something else, such as < for <,
and > for >. Additionally, if server-side
includes are enabled, it may be worth turning them off if not necessary.
Buffer Overflows
A major source of vulnerabilities in C and other compiled languages
has been incorrect assumptions about the size of input to the program.
Here is an example in C showing how a buffer overflow could occur:
#include <stdio.h>
#include <stdlib.h>
char query_string_copy [256];
int main (int argc, char *argv [])
{
char *qs;
qs = getenv ("QUERY_STRING");
strcpy (query_string_copy, qs);
}
This piece of code gets a pointer to the query string in the environment
and makes its own copy of it. However, the buffer that is to receive
the copy is only 256 bytes long. If the query string (including the
null terminator) is more than 256 bytes long, strcpy will blindly
do what it is told and scribble all over whatever comes after query_string_copy
in memory.
The CGI program may merely crash in a situation like this. However,
CGI scripts that are open source and that have bugs like these become
easy for dedicated hackers to exploit. A classic form of exploitation
of buffer overflows is for the hacker to discover a place in a CGI
script where input is not properly length-checked. Then the hacker
can design an input string that is intended to overflow the buffer
and overlay something specific, such as a return address to the
calling function. Once this has happened, the hacker has effectively
hijacked the CGI script. The hacker could make the CGI script pass
control to some code supplied by the hacker, which could then do
just about anything (e.g., deleting files, opening up an xterm on
the hacker's host, etc.).
Fixing this Vulnerability
When writing CGI code in C, always check the size of all input
data and ensure that buffers are never overrun. Avoid the use of
the following C library functions, which copy into a destination
buffer and do not take a destination length argument or, on some
systems, are themselves vulnerable to overflowing of internal buffers:
gets (), strcpy (), strcat (), sprintf (),
fscanf (), scanf (), sscanf (), vsprintf (),
realpath (), getopt (), getpass (), streadd (),
strecpy (), strtrns ()
If you use an ANSI C compiler, use function prototypes to ensure that
the types of the arguments passed to functions match what the functions
expect. If you don't use prototypes, it's very easy to have
a type mismatch and never know it.
Besides this, it is a matter of fixing compiler warnings, careful
inspection, testing, and debugging.
Other Security Gotchas
Sometimes programs such as shells and interpreters that are designed
to run other programs are located in places where they can be invoked
by a request to the Web server. For example, in Windows environments,
the Perl interpreter (PERL.EXE) may be located in the cgi-bin
directory. This is extremely dangerous, because it allows anyone
to run arbitrary commands on the server. Do not do it! No program
that you don't want the whole world to be able to invoke should
be in any directory that is defined to the Web server as a CGI directory.
Be careful with temporary files because they could disclose information
about the CGI script, the configuration of the server, or confidential
information about users. If a CGI script has to create temporary
files, those files should be created with the most restrictive permissions
possible. If no other users need to read or write to the file, don't
give them permission to. If there is no need to have the file stay
around after the CGI script is no longer running, make sure it gets
deleted before the script terminates. If possible, create temporary
files in directories that are readable and writable only by the
user that the CGI script runs as.
Also, beware of temporary files that text editors and other development
tools might leave in a CGI directory. A temporary file created by
an editor and left in a CGI directory could enable hackers to run
old versions of CGI scripts or get the source code.
Likewise, core files can also disclose information that could
be useful to somebody trying to compromise a system. Maybe a hacker
has found a way to make a CGI script core dump, and the hacker knows
that the CGI script has some confidential information in variables.
The hacker could feed the CGI script input to make it core dump,
and then get a copy of the dump. If a CGI script is written in C,
then when it is in production, use the setrlimit () system
call to limit the size of the core file to 0.
SUID and SGID CGI Scripts
In UNIX systems, there is a bit in the file permissions called
SUID. When the SUID bit is set in a command's file permissions,
the program runs with the permissions of the owner of the file,
rather than the permissions of the user that started it. Likewise,
there is a SGID bit in the file permissions that causes the file
to run with the permissions of the group associated with the file.
Typically, SUID is used when the script or program needs to be superuser
(i.e., root). A well-behaved SUID program gives up its extra privileges
as soon as possible.
It can be dangerous to have SUID or SGID CGI scripts, so their
use should be avoided if at all possible. If it is necessary for
a CGI script to do something with more privilege than the Web server,
take these steps to limit the possible security exposure:
1. Do not just make it SUID root. Is there, or could there be,
another account that has sufficient privileges but is not superuser?
It is better not to run as root if not absolutely necessary.
2. Do not write a SUID CGI script in a shell scripting language
(csh, ksh, etc.). There are too many possible security
problems.
3. Make sure that the CGI script gives up its extra privileges
except when it needs them, by setting its effective user ID to the
real user ID.
Note that if Perl 5 is used, taint checking is automatically turned
on when the script is SUID or SGID.
Putting a CGI Script in Its Own Sandbox
In an environment in which there are multiple authors of CGI scripts
(e.g., a server that is hosting multiple Web sites), it is sometimes
advantageous to run CGI scripts as the user who is responsible for
the CGI script, not as the Web server. This is done with a piece
of software called a CGI wrapper.
A commonly used CGI wrapper is called CGIWrap, and it is available
from http://www.umr.edu/~cgiwrap. CGIWrap is a SUID CGI script
that executes other CGI scripts as the user who owns the file, rather
than the Web server. It will run under just about any UNIX-based
Web server. Typically, the Webmaster develops a policy that all
users' CGI scripts must run under CGIWrap. The user puts CGI
scripts in a directory under their home directories, and CGIWrap
executes the users' CGI scripts from there.
As an example of how CGIWrap might be used, suppose that a server
runs two Web sites, one owned by user Bob and one owned by user
Joe. Bob wants to have some CGI scripts, so he creates a directory
called public_html/cgi-bin under his home directory home/Bob.
Joe puts the CGI scripts for his Web site in home/Joe/public_html/cgi-bin.
The executable for CGIWrap goes in the Web server's main cgi-bin
directory and is SUID as root. CGIWrap runs all user scripts.
CGIWrap causes Bob's CGI scripts to run under the permissions
of user Bob and Joe's scripts to run as user Joe. Bob's
CGI scripts can, if carelessly coded, trash anything writable by
Bob, just as Joe's CGI scripts can trash Joe's data. However,
unless Joe has given Bob write permissions to his files, Joe's
CGI scripts cannot trash Bob's data.
There are other CGI wrappers besides CGIWrap. Another commonly
used one is suEXEC, which comes with the Apache Web server. suEXEC
operates on the same general principles as other CGI wrappers, but
it is designed to take advantage of Apache's implementation
and can only be used with Apache.
In UNIX systems, there is a facility called chroot, which is a
way of giving a program its own root file system outside of which
it cannot access. For example, if chroot was used to change a program's
root file system to /hom/Joe, and that program tried to open
/etc/hosts, then it would actually open /home/Joe/etc/hosts.
Once chroot is done, it cannot be undone. Any programs started by
a program running in a chroot environment inherit the parent's
chroot environment. In effect, the program is locked in a cage that
it cannot break out of. This is a very good way of further restricting
the potential damage that untrusted CGI scripts can do. There is
another CGI wrapper called sbox, from http://stein.cshl.org/software/sbox
that makes use of chroot to restrict the environment in which CGI
scripts run. If Joe's CGI scripts are always started in a chroot
environment with /home/Joe as the root file system for Joe's
CGI scripts, then it is impossible for Joe's CGI scripts to
even attempt to access anything outside of /home/Joe. However,
using chroot to restrict CGI scripts can involve a lot of work.
All files that are needed in order for the CGI scripts to run (i.e.,
shared libraries, the Perl interpreter, and various configuration
files) all must exist within the restricted area. This means that
directories such as /usr, /tmp, /dev, /etc,
and others, will have to be created within the chroot environment.
These directories will have to be populated with the subsets of
the real directories' files, which are needed in order to support
the programs that run under the chroot environment.
The use of CGI wrappers make users accountable for the actions
of their individual scripts, rather than having an amorphous mass
of scripts that various users have responsibility for, all running
as "nobody". CGI wrappers are not a security panacea however.
All of the nasty things that CGI scripts running as "nobody"
can do can also be done by CGI scripts running as any other user.
There are a lot of world readable files on the typical UNIX system
that you don't want anybody with a Web browser to access.
Developing a CGI Security Strategy
There are obviously many security issues that a Webmaster must
consider, and high on the list should be the security issues associated
with CGI scripts.
Taking Responsibility
The Webmaster must ensure that all CGI scripts placed on any Web
server have been through a process to find and fix security holes.
Some of the items that should be on the CGI script security checklist
include:
- Is all input parsed to ensure that the input is not going to
make the CGI script do something unexpected? Is the CGI script
eliminating or escaping shell metacharacters if the data is going
to be passed to a subshell? Is all form input being checked to
ensure that all values are legal? Is text input being examined
for malicious HTML tags?
- Is the CGI script starting subshells? If so, why? Is there
a way to accomplish the same thing without starting a subshell?
- Is the CGI script relying on possibly insecure environment
variables such as PATH?
- If the CGI script is written in C, or another language that
doesn't support safe string and array handling, is there
any case in which input could cause the CGI script to store off
the end of a buffer or array?
- If the CGI script is written in Perl, is taint checking being
used?
- Is the CGI script SUID or SGID? If so, does it really need
to be? If it is running as the superuser, does it really need
that much privilege? Could a less privileged user be set up? Does
the CGI script give up its extra privileges when no longer needed?
- Are there any programs or files in CGI directories that don't
need to be there or should not be there, such as shells and interpreters?
Language Considerations
One very important thing to consider is what programming languages
will be allowed for CGI scripts. Perl has the best security features,
but it is an interpreted language and therefore inherently slower
than a compiled language like C. If the job can be done quickly
enough by a Perl CGI script, it's probably better to go with
Perl; otherwise, use C. Never allow CGI scripts to be written in
shell scripting languages. There are too many potential security
problems.
Using Other People's Code
Using code that you have downloaded from the Internet is fine.
In fact, it has its advantages. A CGI script that has been used
previously by lots of other people probably has fewer bugs than
one that somebody has just cooked up. However, you need to be cautious.
When you get CGI scripts off the Internet, make sure you check on
what bug fixes might be available. Use the most current, stable
version. Read through the fix history to make sure you have the
latest applicable security bug fixes. It should be checked as rigorously
as any script written in house. If it is written in a compiled language
like C, do not just download a binary and install it, even
if you've seen the source code. How do you know that the binary
matches the source?
Conclusion
CGI security is a difficult and complex subject to tackle. There
are many variables, involving the CGI script itself, its environment,
the Web server, the operating system, and whatever input all the
millions of users might throw at a CGI script. However, it is still
extremely important to to come to grips with CGI security. Not doing
so could be disastrous.
Charles Walker is a computer consultant specializing in IP
based protocols. Originally from the U.S., he currently lives in
London. He can be contacted at: chw@trionetworks.com.
Larry Bennett is a networking consultant specializing in security
and performance. He is based near London and can be contacted at:
larry.bennett@trionetworks.com.