Article

Access Control in Sensitive WWW Environments

Luca Salvadori

Publishing on the Web is rapidly becoming the fastest and most efficient way for companies to distribute information and data to customers and commercial partners. Furthermore, the Internet is exploding within companies to form so-called Intranets, where information and services are offered to employees and management. Thus, while results are ever more encouraging, system administrators are under increasing pressure because much of the published data is highly sensitive and not all users are allowed to access it. Therefore, a thorough and comprehensive effort is needed to secure data and validate users without exposing data to unauthorized eyes.

Several incarnations of httpd services, even the oldest ones, include features allowing such protection. However, since http protocol has been created for an open world, some tricks and tips are necessary for sys admins to avoid surprises. This article will expose some of those tricks and explore a set of tools I developed to help ease management and control tasks in a secured, controlled-access Web environment.

httpd Security Concepts

As any Web surfer has experienced, some sites are defined as "secure", because they allow cryptography over incoming and outgoing links. This is useful to prevent hackers from spoofing sensitive information, such as credit cards numbers and passwords. Many techniques have been developed over the years to keep Web content secure, but this paper will focus on whether or not to allow a specific user, computer, or network access to a particular set of data, encrypted or not.

It is useful, therefore, to explain some basic concepts about httpd security. Information is stored in directory trees, and basic file and directory protections (as allowed by the filesystem in use) apply. In addition, since the delivery of file contents is performed by a TCP/IP-based process (httpd), the network is highly involved. Therefore, it's easy to conclude that both file protection and network access must be carefully managed to allow proper security configuration. Because file access is a matter of local username privileges, a survey must be performed to avoid unnecessary access. Since network access is a matter of IP addresses, domain names, and protocols, careful selection and planning of allowed systems and access modes are needed.

Finally, httpd can only restrict access to directories, not to single files. Thus, care must be taken by Web publishers to properly structure and organize their security-sensitive contents in dedicated directories that can be protected as needed.

Local System Considerations

Whenever sensitive information must be offered on line, it is good practice to use a dedicated server for the purpose. This server should be secured such that it can only be accessed the way it is intended (i.e., no generic usernames, no ordinary users, only system- or content-maintainers preferably accessing the system through ftp, no NFS exports, no anonymous ftp, and strict password control with all available tools). Basically, secure Web servers must be accessed only through httpd.

Since the system load generated by http service is usually low, an http server can be easily implemented on small systems, even an old, otherwise minimally useful 386 PC. Such a system can be reconfigured with Linux and NCSA httpd or Apache, to provide reasonable http service at a minimal cost, assuming sufficient disk space is available for hosting the Web contents.

Network Considerations

After the Web server has been internally secured and can be accessed only through httpd, some reasoning is necessary about network access. httpd allows network access controls in its basic configuration files. For example, one could restrict access to the whole server or part of it by declaring allowed or forbidden IP addresses or domain names.

Therefore, it is necessary to identify IP addresses, single addresses, groups, or classes, from which access is to be allowed, while forbidding all others. This assumes that local DNS (Domain Name Server) systems work properly. Other security considerations, such as firewall protection, still apply but are beyond the scope of this article.

Content Considerations

As stated above, proper structure must be implemented in Web content to allow protections to be effective. This requires a strict coordination between authors (who are usually not skilled security managers) and sys admins, at least during the initial phase of deployment. Authors must be informed of basic security concepts related to their Web pages, and strong emphasis must be placed on a directory-level protection scheme. This may require a heavily branched directory tree, potentially leading to confusion; thus sys admins are encouraged to tutor authors and discuss document structure with them. Since security problems and related damage are usually considered a result of the sys admin's mismanagement, this is the best way to clarify that Web content access is primarily driven by authors themselves.

Configuration Files and Syntax

After discussions with your Web content authors, you are ready to set basic protections to the local system and http daemon. By carefully editing access configuration files (usually ~httpd/conf/access.conf) it is possible to restrict access to the base directory, CGI script directories, and other relevant parts of the system. A directory protection fragment in access.conf has the following form:

<Directory /mydir>

<Limit GET>
AuthUserFile /etc/htusers
AuthGroupFile /etc/htgroup
order deny, allow
allow from 100.101.102
allow from 200.201.202
allow from mydomain.com
deny from all
require user user1 user2
require group group1 group2
</Limit>
     
</Directory>

The syntax is straightforward: <Directory> and <Directory> tags enclose information related to directory /mydir (rooted under the base httpd home directory, usually ~httpd/htdocs, i.e., plain directory name is ~httpd/htdocs/mydir). <Limit> and <Limit> tags refer to access limitation of http GET directive.

In the above example, we defined that access is denied by default to everyone, then selectively allowed in accordance with the defined rules. The example rules state that access is allowed to networks 100.101.102, 200.201.202, and to domain mydomain.com (provided a working DNS is found whenever a request is received from that domain). Other addresses and domains are rejected. Finally, user authentication is defined: users user1 and user2 are allowed in, as well as those included in group1 and group2. Both usernames and groups are defined in files listed by AuthUserFile and AuthGroupFile directives. Conditions are additive; therefore, all conditions must be met for the requested document to be sent through the channel. Otherwise an error page is displayed on the requesting user's browser.

The same syntax applies to other files residing in Web content directories, which regulate access to those directories and lower branches. These files can be managed by authors or by sys admins; however, I strongly suggest the latter. Local security files, scanned by httpd whenever directory access is required, are named as declared in the AccessFileName directive in the ~httpd/conf/srm.conf file. Thus, while protection schemes are inherited by directories below the protected one, more restrictive conditions can be applied to subdirectories and their branches. Remember, restrictions are additive. To gain access to a directory, the user must be able to access higher directories.

User Authentication

httpd implements a rudimentary, albeit effective, way to identify users through the usual username-password mechanism. Relevant data is stored in files declared by AuthUserFile and AuthGroupFile directives in ~httpd/conf/httpd.conf or local security files. AuthUserFile and AuthGroupFile files can be different for each directory or tree, and can be directly managed by authors. I suggest nevertheless that sys admins retain control of this task.

The user file format is quite simple: a record includes a username and an encrypted password, separated by semicolon. The group file record contains a groupname and a list of associated usernames, separated by blanks. Whenever a record length reaches about 200 characters, the groupname should be repeated on a new line to include additional users. Otherwise, usernames beyond the 200-character limit will be disregarded.

Passwords can be created through httpd utilities included in distribution files (htpasswd for instance) or home-developed tools. If your httpd host will have many users, you will likely want to write maintenance tools in Web-friendly languages, such as Perl, to assist in the administrative task of creating usernames, encrypting passwords, and inserting the records in to the appropriate files. Authors of special content sections must inform you of user lists and privileges associated with the content of their Web pages.

Tools and Tricks

My company wanted to implement a management reporting system on the intranet to allow all levels of management staff to access budget, project, and corporate data. The primary goal of the project was to save the labor associated with previous paper-based systems. The project was obviously security sensitive. We developed a set of tools to enable proper Web security management. These tools cover the following administrative areas:

Password generation and dissemination
Password policy enforcement
Directory protection
Security audit
Access logging
Web content distribution

Since the tools were to be portable to various systems, we decided to use Perl, bash, and other standard UNIX facilities. All scripts allow proper parameterization and options.

Password Generation and Dissemination

The first task we faced was defining usernames and assigning passwords. Since the first intranet implementation involved about 50 users and we expected the pool of users to increase to about 150, we wanted an automated tool for the task. A fairly simple Perl script generated a random set of passwords of the desired length, while retaining some readability by using vowel-consonant pairs. Such passwords were related to usernames defined in the basic AuthUserFile, encrypted with a random salt, and substituted in the user record. They were also sent by mail to the users. Since users are local to the network, unencrypted password mailing was considered only a minor risk, and therefore a more complicated effort was avoided. Password generation and (optional) AuthUserFile update are done by chtpwd, while password dissemination is done by sendhtpwd. All listings for this article are available from ftp.mfi.com in /pub/sysadmin.

Password Policy Enforcement

Since password aging is not implemented in the NCSA https I use, it was only a matter of putting mkhtpasswd and sendhtpwd into crontab to be run every month to force a password change. Further improvement may lead to stricter policies, which at the moment have not been considered essential.

Directory Protection

This was the most difficult task. Creating a .htaccess file for every directory is simple, but as soon as directories number in the hundreds, it becomes a nightmare. Therefore, I decided to store access information in a plain text file, where a line stores full directory name and usernames, groups allowed, and denied networks in the following format:

/my/dir/name USER=user1,user2 GROUP=group1,group2
ALLOW=100.101.102,my.domain.com
DENY=300.301.302,yet.another.domain.com

A script (mkhtaccess) parses this kind of file and, with various options, creates .htaccess files in all directories listed based on a command-line supplied template. Single directory protection is possible through command-line options. Another script, called gethtaccess, does the reverse (i.e., for every directory in a tree, it gets the proper protection information stored in .htaccess file (if any) and stores it in a file for editing and later use by mkhtaccess).

Security Audit

Checking that applied protections are what users expect is a key issue. Therefore a tool was developed to check this. The chkhtaccess script scans a directory tree for .htaccess files and warns about directories without protection, (i.e., where no .htaccess file is found). This script should be run at least daily to warn security managers about possible breaches or errors. A useful and easy improvement to chkhtaccess is to copy a default .htaccess file to unprotected directories to avoid unauthorized access until administrators take proper action.

Access Logging

Any Web access is logged by httpd in a logfile (usually ~httpd/logs/access_log). This is a plain text file; therefore, parsing may be performed to get information. A simple one-line script (cat | grep | mail, etc.) may be used to get unauthorized access information, password mismatch, and other security-related issues. Such a one-liner may be run periodically to warn about intrusions.

Web Content Distribution

Since direct access to Web server is inherently dangerous, I restricted authors to ftp-only logins. This is not an unreasonable limitation, since most of them work on Windows-based PCs and use common Web-authoring tools to create their pages. Thus, full exposure to UNIX would be confusing for them.

I instructed users to create zip files of their work and transfer them to the server by ftp in a directory to which they have exclusive access. A script, (userunzip), invoked every 15 minutes by cron, expands the zip files, moves them to the final location, and triggers other ancillary tasks (file protection checks, symbolic links creation, etc.). Every author has his or her own home directory and public_html subtree. Authors are therefore able to develop and check their work "at home" and move it to the production server autonomously without any sys admin assistance. Sys admins sleep well since tools are in place to check for author mistakes and warn of any required action.

Code Description

As explained before, maximum care has been exercised to ensure portability. Therefore, scripts are implemented in sh (bash on Linux, our reference platform for Web applications) or Perl, and no special tools are needed beyond the standard operating system and Perl.

Most scripts share a common core to parse line input and parameters and to properly initialize processing as required by selected options. Then specific tasks are performed. The rest of code is self-explanatory. Just be careful about passwords: scripts working on them store or get information from plain script files, where passwords are stored in clear text and in encrypted form. Thus, it is necessary to keep these files in restricted directories.

Conclusions

Managing access rights to a complex, highly branched Web tree is simple in concept but difficult in practice. Rudimentary access control is implemented in http servers, therefore some work is needed to keep the situation under control. This toolset allows easy management of directory protection, as well as usernames and passwords, allowing me to sleep well.

Listings

The following is list of all scripts and their basic function. For further details just read them or run them with -H option. (All listings are available from www.samag.com or ftp.mfi.com in /pub/sysadmin.)

chkhtaccess (Listing 1) - Bash script. Checks directory trees for .htaccess files.
mkhtaccess (Listing 2) - Creates .htaccess files for a directory or tree as supplied by input file or keyboard.
gethtaccess (Listing 3) - Scans a directory tree for .htaccess files and stores access information in a text file (to be used by mkhtaccess).
newhtuser (Listing 4) - Creates a new username assigning it a random password. Data are stored in a plain file to be used by sendhtpwd.
sendhtpwd (Listing 5) - Gets data from newhtuser-produced file and sends mail messages to relevant users.
chtpwd (Listing 6) - Creates httpd password for relevant usernames, optionally updating actual htpasswd file.
userunzip (Listing 7) - Expands and moves to final destination authors files, optionally invoking post-processing utilities.

About the Author

Luca Salvadori is head of the Information Systems Dept. at LABEN S.p.A., a leading European Space Electronics supplier. His experience spans from PDP-11 to Linux, Windows, PCs, and networking. He manages the Company's Web site (http://www.laben.it/) and (in his free time with a bunch of friends) another Linux box (http://aeroweb.lucia.it/) dedicated to aviation. As a hobby, he flies sport aerobatics. Luca can be reached at:lsalvadori@batman.laben.it.