Article

Hiding UNIX Applications in Utility Wrappers

Larry Reznick

On some HP 9000 systems I administered not long ago, a quality assurance group ran software that tested chip production quality. Each installation of this software was configured to run a special battery of tests, but configuration was built into the software itself, not into a set of data files. Thus, each software installation on a system testing a different chip was entirely different from the installation on another system. Data files produced by this software uniquely identified characteristics of the chip and testing and had to remain with the software to make sense. This software was never designed to run in a multiple machine, networked, mass testing environment, as this company used it.

The testing software did a great job for this company's needs but the unmodular design required 50 to 100 megabytes for each installation. As one testing system might test several chips in a few weeks, the disk space crunch was extraordinary. Frequently, the QA manager would shuffle test directory trees from place to place to make room for the latest tests. Sometime later, a test engineer would need to review specific tests and run some more, which would require restoring the entire directory tree, thus forcing other trees out. With half a dozen test machines already online and several more installations expected within months, the crunch was impacting the network's disk space and threatening the QA group's ability to keep up with their production schedules.

Nothing could be done about the software they used. A solution must keep the program files containing the special test configurations with the data files produced by those programs. Once testing was finished, the entire directory tree could be moved off the test system but had to be kept in a place where the QA manager could find it. Once found, it had to be installable on any other system available for examination or running more tests. Most of all, the disk space used by this gigantic combination of programs and data had to be reduced.

Keep It Simple

Once presented with the entire problem, after considering a variety of other solutions each shot down by the inflexibility of the testing software, I fell back on a tried-and-true method: archive the test directory tree and compress the archive. Inelegant, but no other method maintained each unique directory tree exactly without creating other problems for the inflexible software.

Because many of these trees existed and the QA manager might need any one at some later time, I set up a central location to store them. But the QA manager wanted users to be able to make changes to the software configuration for future testing without interfering with information contained within previous tests. Thus, the central location had to keep a history of all previous tests with the latest version.

While most of the time QA engineers would use the latest version, occasionally someone might need to go back to an earlier version for review or retesting. Engineers who didn't know much about the UNIX system or the network had to be able to get the latest version easily. The project leaders who change the software for special tests could not collide with each other's use of the archive and could not interfere with engineers who might be testing with the latest version. The QA manager had to be able to easily find any particular archive no matter how far back in history.

The solution presenting itself combined archive building and compressing with revision control. Scripts to control the archive are simple, but using revision control worried me. The QA manager was more familiar with UNIX tools and operations than the project leaders and engineers. Project leaders were typically familiar with the way the network worked and with some of UNIX, but not much. Most engineers didn't know anything about UNIX or the network, but needed to extract the latest archive version set up for them by a project leader and run the software. Generally, the requirement could be stated as: Give these people a simple set of commands to set themselves up and then let them get back to their real work: running the software and evaluating the results. If I gave them a bunch of complex UNIX commands with a kitchen sink of options, they would never use the solution. I also needed to give the QA manager flexible options but keep the operation simple for everybody else. By writing a wrapper application for the RCS commands, I could keep the application simple for the project leaders and engineers, yet still make RCS's flexible options available to the QA manager.

Constructing the Application

Archiving is the easy part. tar files aren't reliably portable across different systems -- the directory trees were used on HP systems but the archives would be kept on the Sun network. One system's tar files may not be readable by another system's tar. tar produces binary headers in the archive. cpio has a -c (for compatibility) option that produces text headers, eliminating tar's cross-system problem.

Delivering cpio's output to compress produced wonderful savings. One sample directory tree produced a cpio file 49,219,072 bytes long. compress turned that into 4,673,073 bytes -- a 90.5 percent reduction. While other compression software might do better, compress was already on the system. We could always change the script later to use other compression software if something else produced a worthwhile improvement.

Another Problem

Revision control kicked in an extra problem. Although a few binary data files and binary executable files were in the directory trees, most of the files were text, making extreme compression possible. But compression produces binary files. Revision control programs, SCCS and RCS, don't handle binary data well. Again, using an old file transfer technique, by uuencoding the archive I could make RCS handle it.

uuencode produces plain text files at the cost of expanding the compressed file a little. Running the 4,673,073-byte compressed archive through uuencode produced a file 6,438,497 bytes long. That's 37.8 percent larger, but the result is still 86.9 percent smaller than the original file.

Archiving

Listing 1 shows the mkarc archiving script. The first argument is the directory tree's name, and the second is the archive name. Resulting archives had .Z.uu suffixes, representing that they were compressed and uuencoded. In case someone types the .Z.uu suffix, the basename program strips it from the archive argument before assigning it to the ARC variable.

A subshell pipeline constructs the archive using relative pathnames, so that when someone extracts from the archive the files will go to the current directory. The archive is piped to compress and then to uencode, which adds the .Z suffix to the archive's filename.

Yet Another Problem

At this point, the finished archive would have been written to a file, adding the .uu suffix to the archive's name. But uuencode's "begin" record, placed in the first line to identify where uudecode should start its reconstruction, contains the original file's permissions. When uuencode reads the file directly, it takes those permissions from the file's inode. Coming from a pipe, uuencode has no meaningful inode to read, so it assigns no permissions: 000! That means uudecode will give no permissions to the archive file, so nobody could extract data from the archive file.

Adding a simple awk script to the end of mkarc rewrites the "begin" record using 664 permissions. All the other input lines that awk sees will print without change. awk's result goes to the archive file with a .Z.uu suffix added.

Listing 2, the dearc script, offers archive extraction by naming the archive in the first argument and the destination directory in the second argument. dearc also accepts a -t option, to show the archive's table of contents. After figuring out which way to run, dearc strips the .uu suffix from the archive filename to factor out the ZFILE name, simplifying command syntax later in the script.

Once the file is uudecoded a trap makes sure the uudecoded file won't remain on the system after dearc is finished with it. If only a table of contents is needed, zcat delivers the uncompressed archive to cpio, showing the table, then the ZFILE is removed. The original file stays around.

If the command line requests a directory other than the current (.) directory, mkdir's -p option creates the full path needed, throwing away any error messages, including a message indicating that the directory already exists. If mkdir couldn't create the directory, the script announces that and quits. Otherwise, it extracts the ZFILE in the named directory and quits, cleaning up the ZFILE on the way out.

Wrapping Up RCS

Listing 3 shows qarcs, the wrapper around RCS for the QA group. This wrapper script contains 10 smaller scripts for management of the archives. By using links to the master script, each link named after the smaller scripts, maintenance focuses on one script. When you vary existing commands, you simply change one script and all commands will get the changes. Adding a new command is a matter of adding the code to the master script and adding a link with the new command's name.

The script figures out what to do by looking at its name. This technique eliminates command-line options in favor of more descriptive command names. Give the engineers one or two simple command names to use and they'll be happy. More sophisticated project leaders and the QA manager might take advantage of more sophisticated options, so other commands provide those facilities.

All commands begin with qar, for QA Revision. The rest of the name identifies the command's action. This naming scheme avoids conflict with other commands that might be named the same. Commands include:

co -- Check out for modification

get -- Check out for looking only

toc -- Table of contents viewing only

ci -- Check in after modification

log -- Show the revision control log

which -- List the archives checked in

cs -- Control revision system attributes

merge -- Combine two archives

install -- Create the script's links

help -- Print the script's usage message

On different systems, RCS was installed in different directories. An RCSPATH variable tries to make the RCS files' locations uniform. Once GNU RCS was installed in a regular location, RCSPATH was simply /usr/local/bin. To find the RCS programs in the meantime, a more complex expression constructed RCSPATH by finding where the shell thought rcs was. A type command finds rcs on the path and delivers the information to awk, which extracts the pathname from type's output and hands that full pathname to dirname. dirname strips the rcs's name from the full pathname, leaving only the directory name.

QARCSDIR holds the name of the central directory. A disk array filesystem holding dozens of gigabytes mounted for archiving simplified the QA group's needs. All their files would go in a directory named RCS, distinct from any other archiving already allocated for that drive.

The script doesn't show a help screen until after common options are initialized, even though after showing the help screen, the script quits. Initializing those options lets the help screen show the default values for those options. If command-line options change any of the values, the help screen shows their latest settings.

qarcs supports several options of its own. All options are uppercase to distinguish them from regular RCS options, which are all lowercase. A -P option identifies a new path for the archives, in case someone wants to keep a special archive out of the standard location. qarcs assumes the current directory is the place to work with for creating an archive from a tree or extracting a tree from an archive. A -D option lets the user define a different destination directory that holds the tree. Using -X turns on shell debugging, in case anyone wants to see that mess, and -H gets help from any qarcs command, acting like qarhelp. When -H is found, the help screen is shown and the program immediately quits. If -H comes last in the command-line option set, other options will have already set their special values, so the help screen can show some of those other options' effects. All other hyphenated command-line options are collected in an OPT variable. OPT gets passed with the RCS command so that regular RCS options can appear on qarcs's command line.

Special Commands

All but two qarcs commands set up prefix commands, suffix commands, and special options, then let a general command handler do all the work. The two exceptions are qarinstall, which creates the script links and the archive directory, and qarwhich, which lists the names currently in the archive directory.

qarinstall loops through all the command names except "qarinstall" to create the symbolic links to the master script. Thus, qarinstall must be the name of the initial script file to install the rest of the script names. Links are created in the current directory, allowing installation in special directories possibly different from a generally accessible directory on everybody's path. When the QARCSDIR is created, it is given the appropriate permissions and the qarinstall script quits.

qarwhich checks that the QARCSDIR is present. If so, it takes a simple ASCII-sorted listing, selects only the RCS files containing a ",v" in the filename, but then strips the ",v" to list them. Users shouldn't have to type the ",v" when referring to archive filenames, so the list doesn't show it to them, thus preventing confusion.

General Commands

A simple command loop factors out the common operation of all the scripts:

1. Precede the RCS command with some special preparation.

2. Execute the RCS command.

3. Follow the RCS command with some special action or cleanup work.

Program name evaluation figures out which script was invoked, sets appropriate option variables and commands, then executes the script.

Several commands use a funny-looking syntax:

`basename $f .Z.uu`.Z.uu

This covers the case where someone types the .Z.uu suffix in an archive name. qarcs archives always have that suffix, but users aren't required to type it. If they don't type it, qarcs commands add it, but if it is already there, this basename command strips it out. Once the filename is regular, the .Z.uu suffix can be safely attached.

Check Out

RCS's co command checks out an archive for modification by the QA manager or one of the project leaders. However, engineers should be able to check out an unmodifiable copy of an archive to do their work. Project leaders occasionally need to examine an unmodifiable copy while another leader or the manager is making a new version from an existing archive.

The co command handles checking out of both modifiable and unmodifiable archives by using a -l (lock) option to create modifiable checkouts. Once the lock is set, nobody can check out another modifiable copy, but anyone may check out an unmodifiable copy. So, the only difference between the qarco and qarget commands is the mandatory option, -l, included with qarco. Locking really prevents checking in a file by anyone other than the person who checked out the locked copy. Anyone may check out a read-only copy whether a read-write copy was checked out or not.

When RCS's co command finishes checking out the archive, the script executes dearc to extract the compressed, uuencoded archive and rebuild the directory tree. When finished, the archive extracted from RCS's control is destroyed. Destroying a file checked out from revision control is not usually a good idea, but later on the entire archive will be reconstructed by mkarc, so it isn't a real loss.

Table of Contents

Because dearc has its own table of contents facility, qartoc simply checks out an unmodifiable copy of the archive. The suffix command simply uses dearc's -t option, then removes the checked out copy. Temporarily, the compressed, uuencoded archive takes up space, but that doesn't last very long.

Check In

Checking in requires that mkarc execute to create the archive before executing RCS's ci command. No special options accompany ci, but the user could deliver RCS options on the script's command line to get special actions.

Log Printing

Generally, RCS commands execute quietly because most people using these scripts won't know or care about what RCS tells them. A QUIET variable is set to -q for all qarcs scripts, but qarlog, executing RCS's rlog on the HP system's version of RCS, didn't like that option. Canceling it for qarlog was simpler than trying to detect which system RCS was running on.

Controlling and Merging

Both rcs and rcsmerge commands are simple because they use other RCS options. I expected the QA manager would use qarcs to do occasional housekeeping work, such as unlocking a revision when someone decided not to continue working with a modifiable checkout archive. Because of the unusual nature of the archive files -- containing entire directory trees, which is in conflict with the original purpose of revision control software -- I didn't expect anyone would ever want to merge archive modifications. I put in the qarmerge command just in case I was wrong.

Conclusion

qarcs made everybody happy. The system support people were happy because the archives were between 10 and 20 percent of their original sizes. Disk space crunches were postponed and manageable. The QA manager was happy because he could keep as many as 10 archives in the space previously used by only one, so he had more historical information than he'd ever had before. Without this historical information online, he had had to go back to old tapes when special revisions were needed and reconstruct his group's work. Using RCS's revision comment storage and qartoc's table of contents, he could know exactly what he needed quickly. He and the project leaders were happy because they could make changes without worrying about who might clobber their work. Engineers were happy because they only needed to remember one command, qarget, and the name of their archive. They named archives based on the chip name and revision numbers, but if ever they weren't sure of an archive name, they needed only to remember one other command: qarwhich. Command wrappers are your friends.

About the Author

Larry Reznick has been programming professionally since 1978. He is currently working on systems programming in UNIX, MS-DOS, and OS/2. He teaches C, C++, and UNIX language courses at American River College and at the University of California, Davis extension.