The Linux Kernel: A Case Study for CVS
The Linux kernel provided much of the inspiration for the Open Source model of software development, and the kernel, which is the core of all commercial and non-commercial Linux distributions, remains the most dramatic example of a successful Open Source development project. The source code of kernel version 2.2 comprises more than 1.5 million lines of C and assembly language code, and occupies more than 54 MB of disk space when uncompressed. The Linux developers consist of Linus Torvalds, who has the final say over what code becomes part of the "production" kernels, about a dozen core developers who maintain sections of the source tree, and anyone else who wishes to contribute. As tools often emerge to meet various needs, a new protocol that uses the Concurrent Version System (CVS) has begun to appear on the Internet to meet the needs of programmers who work on Open Source code projects. That protocol is anonymous CVS.
Managing the Source Code
The spiralling number of contributions by programmers has ensured that bugs get fixed rapidly and that new ideas and code get distributed, accepted, or rejected in hours and days instead of weeks and months. This makes kernel development a real-world case study for the Open Source model of software development.
Originally, kernel developers set up an anonymous CVS server at vger.rutgers.edu to provide access to all or part of the kernel source to anyone wishing to modify the source code. The server is not in operation, but the system is host to the Majordomo mailing list server, which handles the developers' messages.
To distribute the complete kernel source to Linux users, there is a system of anonymous ftp sites worldwide that archives the complete sources, as well as patches of intermediate versions. When a file is added to the main site at ftp.kernel.org, all mirror sites are updated soon after. To log into one of the kernel mirror archives, you need to use the two-letter Internet domain abbreviation for the country whose mirror archives you want to access. In Germany, the mirror sites are ftp.de.kernel.org, in Canada, ftp.ca.kernel.org, and in the U.S., ftp.us.kernel.org.
An anonymous CVS server at oldhades.think.de helps maintain the source tree of the kernel code's ISDN subsection, and individual developers often maintain local archives for kernel patches in testing.
The goals of CVS are similar to those of the Revision Control System (RCS) and Source Code Control System (SCCS), but CVS uses a client-server model to provide a single code repository to an arbitrary number of developers. CVS uses a modified form of RCS archives that allows for better handling of conflicts between revisions.
A CVS client on a local machine can log into a server using the following methods:
pserver CVS interactive login server
gserver Authentication using GSSAPI server
kserver Direct connection to Kerberos server
ext Perform server operations using external rsh program
To perform its operations the CVS server recognizes the keywords shown in Figure 1, some of which have close analogs in RCS:
Instead of separate executables, as in the RCS suite of programs, the CVS keywords are used as command arguments.
cvs login # Log into CVS server with rsh.
cvs checkout # Check out specified files or directories.
cvs watch add # Add user name to notification list of work
# done on source files.
The mechanism that allows users to track the work being done on files allows for coordination between users, who can coordinate their modifications.
After logging into the CVS server, a developer can check out the complete source tree, or any part of it. A portion of the terminal output of the CVS checkout operation is shown below.
$ cvs -d :pserver:firstname.lastname@example.org:/i4ldev co isdn4k-utils
cvs server: Updating isdn4k-utils
cvs server: Updating isdn4k-utils/FAQ
In addition to the source files, each directory has a ./CVS subdirectory, which contains the status information of all of the files in that directory. A CVS export command is similar to a checkout command, but it only transfers the source files, and not the CVS subdirectories.
Automatic Queueing and Conferencing
With multiple developers, the watch mechanism allows users to see the modifications being made to the file. Users declare their intention to modify the file with the CVS edit command. When an archive maintainer specifies, with the cvs watch on command, that files are to be watched for modifications, working copies of the archive files are checked out read-only, to remind users to use the CVS edit and unedit commands to notify other users of their intentions.
This mechanism is more flexible than RCS's locking mechanism. Multiple users can coordinate their modifications. The successive modifications can be made in a orderly fashion, to reduce the chance of revision conflicts.
A CVS edit command places a watch on the file or files being edited, similar to the watch command. A CVS commit or unedit command removes the watch. In the ISDN source code archive, outside users do not have permission to issue commands that modify the archive, so the commit command is not used.
When a watch is set on a file, its permissions are read only. To set a watch on all of the files in the local repository, specify the top-level directory:
cvs -d ./isdn4k-utils watch on isdn4k-utils
Transactions in the source repository become permanent only when an edited, working file is checked back into the repository with the commit command. Until that time, a watch command only notifies other users that the working file is checked out for editing. Performing an edit command on a file tells users of your intent to check out the file for editing.
The Root and Repository files in the CVS subdirectory provide information about the repository and its files. If Root contains:
and Repository contains
then CVS considers the directory listed in Repository as the local working directory. The CVS/Entries file contains timestamp information for each file and subdirectory, and Notify contains the notification that a file is being modified.
The "modules" file in the CVSROOT subdirectory contains administrator-supplied symbolic names for working subdirectories and files in the repository, below the $CVSROOT directory. Part of a modules file might look like this:
Revision numbers in a CVS archive follow the conventions of RCS revision files. The first revision is version 1.1, the second revision is 1.2, and so on. The difference is that revision numbers are treated as internal to the CVS archive, and may have nothing to do with the actual release number of the software.
The differences become important, however, when parallel versions of a source file are being maintained, and a release may consist of more than one file. CVS allows maintainers to assign symbolic tags to a group of files, so that they may be checked in or out, branched, or merged, regardless of the actual revision number. If a user places a tag on a set of working files in the current directory, for example:
cvs tag release-1.0 .
he or she can create a branch based on that symbolic label, regardless of of the revision numbers of the release's files in the archive:
cvs rtag -b -r release-1.0 release-1.0-bugfix
At that point, if a programmer already has working copies of the files checked out, he or she can update them from the archive:
cvs update -r release-1.0-bugfix
A Network of Mirrors
To meet the demand of developers who need the most recent version of the kernel sources (which are approximately 13 MB when compressed), the Linux Kernel Archive Mirror System provides 58 ftp and World Wide Web sites with 1 Mb or faster connections to the Internet to provide local archive services.
The mirror sites are updated at least daily using rdist or ftp, and each of the sites has a virtual domain name that includes the country of its location. The two-letter Internet domain of the country where the mirrors are located identifies them. Each site also provides an archive of Linux software, and may also provide WWW hosting local services. Internally, each mirror is given the virtual domain name ftp1, ftp2, ftp3, and so on. This allows the archive maintainers to re-assign virtual domain names as new sites get added and old ones are dropped. The mirror site administrators use their own electronic mailing list, email@example.com, to coordinate the mirror sites.
With the required frequency of daily updates, each mirror site carries the latest distribution and testing versions of the Linux kernels. Many of the sites mirror the main ftp.kernel.org site more frequently, and updates are available to a majority of developers withing an hour or two.
Most developers who work on the Linux kernel development first subscribe to the "linux-kernel" mailing list, download the recent kernel source tree archive, and begin experimenting with the code. After a few weeks of reading the messages of the developers at work, you'll begin to see what issues are of concern at the moment, who is working on them, and what you can do to help. Submitting patches and receiving criticism from one of the code maintainers is a helpful, if sometimes painful, step, because even experienced programmers admit that getting everyone's code to work together can sometimes take considerable effort. Everyone is welcome to contribute, and the programmers new to the project can point out areas that seasoned hackers might have missed. These same programmers provide thorough peer review and testing of each improvement or modification. Overall, however, CVS is an indispensable tool for maintaining order in this chaotic environment.
Concurrent Version System (CVS): http://www.cyclic.com/
Linux Kernel Home Page: http://www.kernel.org/
Linux Kernel Mailing List FAQ: http://www.kernel.org/lkml/
About the Author
Robert Kiesling is the editor of Linux: The Complete Reference, 6th Ed., and a contributor to Linux Installation and Getting Started, version 3.2. He is also maintainer of the Linux Frequently Asked Questions with Answers (FAQ) list. Comments should be directed to firstname.lastname@example.org.