Article

A Web-Based Revision Control System

Richard S. Smith

In an environment where many people frequently update a common set of documents, it is not uncommon for a systems administrator to be asked to implement some method for managing the process. In this article, I will describe a package called "Web/RCS", which I developed for situations like this. Basically Web/RCS is a Perl/CGI-based front-end to the RCS revision control system. RCS is freely available and is often included as part of the base distribution of many UNIX versions (Linux too). Although primarily designed for use by programmers in need of easy-to-use source code control, Web/RCS is also useful in any small or medium-sized environment where the management of file revisions is necessary.

Web/RCS was designed to run on any UNIX system that supports an http server and has the ability to run CGI scripts. My implementation was developed on a Linux 2.0.30 system running Apache 1.2.6 and Perl 5.004. The Web/RCS Perl script must be run using a compiled "wrapper" written in C and set-uid to "root" (the implicit security issues are addressed later in the article). Both Netscape and Microsoft browsers are supported.

I was motivated to write Web/RCS by the demands of my job. I work at a relatively small company, where as few as two or three people, or as many as ten or fifteen, may be working on a given project at any point in time. Since my company, like many others, is interested in leveraging the power of the Web, it seemed logical to implement a Web-based source code control system.

Advantages

This solution offers several enticing benefits. For one thing, the browser interface frees the system from dependence on the UNIX command line. This makes Web/RCS practical to implement in an integrated UNIX/NT environment. There is also a "psychological" advantage in that, even in a UNIX-dominated office, there will always be some people who remain resistant to a system that uses a command-line interface.

During my initial research into the field of Web-based revision control, I found that I was not the first person to have this idea. As far as I can tell, the first system of this type was CVSWeb, implemented by Craig Fenner for use with the FreeBSD project. More recently, Netscape's Mozilla project has also made use of Web-based revision control. However, in both of these cases, the focus has been on leveraging the power of the Internet to allow access by a large number of people scattered over a wide area (the entire world, it would seem). The main purpose of these systems is to allow browsing of a directory hierarchy, reviewing the revision history of files, and viewing and downloading these files. Since the wide-open Internet is an inherently low-security environment, the ability to control check-outs and check-ins of files is severely limited.

Implementation and Security Issues

For my purposes, the need to validate individual users and allow check-in and check-out on a per-user basis was essential. Since I was building on top of RCS, which is hosted on UNIX platforms, it seemed logical to design the system to perform validation based upon the user's UNIX login ID. However, I knew from my prior experience with CGI applications that this was not a trivial matter. If I wanted the system to give each user the same privileges they would have if they were logged in, I would have to write my CGI script to use set-uid. The "normal" way to handle a situation like this is to start as root and switch over to the user's ID as soon as possible after performing a small but essential set of validations. Needless to say, Perl's built-in security mechanisms (such as data tainting) should be used to the fullest extent possible.

Even with these precautions, the use of set-uid CGI scripts is the subject of no small amount of controversy among professional Web administrators. The most obvious solution - challenging the user to enter his UNIX login ID and password, and then validating against the system password file - is considered a gaping security hole in any environment that's accessible to the Internet in any way. It is not very difficult for someone with malicious intentions to write a robot or "bot" in Perl to open up one or more Web connections using the http protocol and perform repeated password attacks, unfettered by even the minimal time-delay protections afforded by the telnet protocol. Once someone obtains a valid userid/password combination in this manner, he can compromise a company's internal network.

However, if the system is to be used in a purely "Intranet" environment, it is possible (in fact, rather easy) to allow users the convenience of logging into Web/RCS with the same user/password combination they would normally use to log into UNIX. This is done via the "mod_auth_sys" module, which is available from any Apache mirror site. A somewhat safer alternative is to use the normal http access methods that rely on customized password files, which are independent from the UNIX password files. However it is still necessary in this case to use logins that correspond to the UNIX user IDs.

Thus, when implementing Web/RCS or any other system of this type, there is an implicit trade-off between depth of functionality on one hand, and security on the other. The key to making this trade-off decision is the level of trust that the sysadmin has in his/her user base. In general, you should assume that Web/RCS has the same vulnerabilities as telnet, and if the security policy at your site prohibits or restricts the use of telnet from the Internet, you should take a comparable stance with Web/RCS.

In summary, Web/RCS makes the following assumptions about its environment:

1. The Apache server, the RCS command-line software, and the RCS directory tree are all on the same system.
2. The RCS directory tree is accessible by CGI applications. If CGI applications can only access files within a certain directory hierarchy, the RCS tree must be in that hierarchy.
3. If users wish to perform direct check-ins and check-outs from their home "RCS" directories, then user home directories must also be accessible from CGI applications.
4. Each user who has access to the RCS directory tree has a UNIX user ID and normal UNIX login privileges on the system.
5. There is a sufficient "trust level" present in the environment. It is assumed that each Web/RCS user can be granted the same privileges they would have if they were using RCS from the UNIX command line.

Using the System

The basic user interface consists of a display of the RCS directory hierarchy that the user can navigate by pointing and clicking, in the same way one would browse through an ftp site. The directory view, as shown in Figure 1, starts with the RCS top-level directory and moves down from there. If the user works primarily in a specific directory, it is possible to access it directly by including the directory path in the URL, and then bookmarking the location for easy access.

When looking at a directory, special icons are used to indicate whether a file is checked in, checked out, or not managed by RCS at all (this last option is not recommended, but Web/RCS will let you get away with it).

Viewing Files

If you click on a file, Web/RCS will provide you with all the information it has on that file, including its revision history and current check-in/check-out status (see Figure 2). You can view the file at its current or any prior revision level. Web/RCS checks to see if the file is ASCII text or some binary format. If it's text, you have the option to view the file using your browser or go directly to a file download. If it's binary, all you can do is download. If you are an RCS command-line user, you also have the option of having Web/RCS copy the file to an RCS tree under your home directory.

Checking Out Files

If a file is checked in and available, you have the option of checking it out (Figure 3). Checking out a file is actually a two-step process. It is assumed that you have already obtained a copy of the file by downloading it through your browser or copying it to your local RCS directory. If you click on the "check-out" button in the Web/RCS file information screen, Web/RCS will then "lock" the file under your user ID. Although RCS supports the concept of locking under multiple revision levels, Web/RCS will only lock the current revision.

Checking In Files

If a file is already checked out, the Web/RCS file information screen will indicate which user has locked the file. If that user is also the current user, there will be a "check-in" button that the user can press to begin the check-in procedure. There are two check-in methods (Figure 4). The user can check in a file that is in his local RCS directory, or he/she can upload the file through the browser. The ability to upload files is a relatively little-used feature of HTML and the http protocol, but it's quite handy in situations like this. Web/RCS will receive the file using the upload protocol, and then perform the RCS check-in procedure.

At this point, a few notes on revision management are in order. RCS works by storing the current version of the file, along with "diff" files to get back to prior versions. In a heterogeneous UNIX/Windows environment, it is quite possible (commonplace, actually) that Windows and NT binary files will be checked out and checked in (by "binary" I am referring to any non-text format, in addition to executables). RCS can handle "binary diffs", but it is very important to make sure that the "diff" program on the UNIX host system can handle them as well. Web/RCS was written to know how to deal with binary files, but it is highly recommended that the user do some testing as well.

Conclusion

If a sys admin is given the task of implementing revision control, there are several challenges he or she must overcome. Over and above the technical issues, it is critical to get "buy-in" from the people who will actually be using the system. A buggy, restrictive, or overly complex revision-control system that tempts people to circumvent it is useless. By leveraging the convenience, familiarity, and cross-platform nature of the Web, the Web/RCS package can be an important part of achieving the overall goal of simple, straightforward document management in a distributed multi-platform environment. I learned a lot from developing this program, and I'd be happy to hear from anyone who has suggestions, ideas, and enhancement requests. To access the Web/RCS package itself, go to: http://www.provista.com or the Sys Admin Web site: www.samag.com.

About the Author

Richard S. Smith is a consultant specializing in projects relating to CGI programming and Web development. He can be reached at: richard@provista.com.