Hiding UNIX Applications in Utility Wrappers
Larry Reznick
On some HP 9000 systems I administered not long ago,
a quality assurance
group ran software that tested chip production quality.
Each installation
of this software was configured to run a special battery
of tests,
but configuration was built into the software itself,
not into a set
of data files. Thus, each software installation on a
system testing
a different chip was entirely different from the installation
on another
system. Data files produced by this software uniquely
identified characteristics
of the chip and testing and had to remain with the software
to make
sense. This software was never designed to run in a
multiple machine,
networked, mass testing environment, as this company
used it.
The testing software did a great job for this company's
needs but
the unmodular design required 50 to 100 megabytes for
each installation.
As one testing system might test several chips in a
few weeks, the
disk space crunch was extraordinary. Frequently, the
QA manager would
shuffle test directory trees from place to place to
make room for
the latest tests. Sometime later, a test engineer would
need to review
specific tests and run some more, which would require
restoring the
entire directory tree, thus forcing other trees out.
With half a dozen
test machines already online and several more installations
expected
within months, the crunch was impacting the network's
disk space and
threatening the QA group's ability to keep up with their
production
schedules.
Nothing could be done about the software they used.
A solution must
keep the program files containing the special test configurations
with the data files produced by those programs. Once
testing was finished,
the entire directory tree could be moved off the test
system but had
to be kept in a place where the QA manager could find
it. Once found,
it had to be installable on any other system available
for examination
or running more tests. Most of all, the disk space
used by this gigantic
combination of programs and data had to be reduced.
Keep It Simple
Once presented with the entire problem, after considering
a variety
of other solutions each shot down by the inflexibility
of the testing
software, I fell back on a tried-and-true method: archive
the test
directory tree and compress the archive. Inelegant,
but no other method
maintained each unique directory tree exactly without
creating other
problems for the inflexible software.
Because many of these trees existed and the QA manager
might need
any one at some later time, I set up a central location
to store them.
But the QA manager wanted users to be able to make changes
to the
software configuration for future testing without interfering
with
information contained within previous tests. Thus, the
central location
had to keep a history of all previous tests with the
latest version.
While most of the time QA engineers would use the latest
version,
occasionally someone might need to go back to an earlier
version for
review or retesting. Engineers who didn't know much
about the UNIX
system or the network had to be able to get the latest
version easily.
The project leaders who change the software for special
tests could
not collide with each other's use of the archive and
could not interfere
with engineers who might be testing with the latest
version. The QA
manager had to be able to easily find any particular
archive no matter
how far back in history.
The solution presenting itself combined archive building
and compressing
with revision control. Scripts to control the archive
are simple,
but using revision control worried me. The QA manager
was more familiar
with UNIX tools and operations than the project leaders
and engineers.
Project leaders were typically familiar with the way
the network worked
and with some of UNIX, but not much. Most engineers
didn't know anything
about UNIX or the network, but needed to extract the
latest archive
version set up for them by a project leader and run
the software.
Generally, the requirement could be stated as: Give
these people a
simple set of commands to set themselves up and then
let them get
back to their real work: running the software and evaluating
the results.
If I gave them a bunch of complex UNIX commands with
a kitchen sink
of options, they would never use the solution. I also
needed to give
the QA manager flexible options but keep the operation
simple for
everybody else. By writing a wrapper application for
the RCS commands,
I could keep the application simple for the project
leaders and engineers,
yet still make RCS's flexible options available to the
QA manager.
Constructing the Application
Archiving is the easy part. tar files aren't reliably
portable across different systems -- the directory trees
were used
on HP systems but the archives would be kept on the
Sun network. One
system's tar files may not be readable by another system's
tar. tar produces binary headers in the archive. cpio
has a -c (for compatibility) option that produces text
headers,
eliminating tar's cross-system problem.
Delivering cpio's output to compress produced
wonderful savings. One sample directory tree produced
a cpio
file 49,219,072 bytes long. compress turned that into
4,673,073
bytes -- a 90.5 percent reduction. While other compression
software
might do better, compress was already on the system.
We could
always change the script later to use other compression
software if
something else produced a worthwhile improvement.
Another Problem
Revision control kicked in an extra problem. Although
a few binary
data files and binary executable files were in the directory
trees,
most of the files were text, making extreme compression
possible.
But compression produces binary files. Revision control
programs,
SCCS and RCS, don't handle binary data well. Again,
using an old file
transfer technique, by uuencoding the archive I could
make RCS handle
it.
uuencode produces plain text files at the cost of
expanding the compressed file a little. Running the
4,673,073-byte
compressed archive through uuencode produced a file
6,438,497
bytes long. That's 37.8 percent larger, but the result
is still 86.9
percent smaller than the original file.
Archiving
Listing 1 shows the mkarc archiving script. The first
argument
is the directory tree's name, and the second is the
archive name.
Resulting archives had .Z.uu suffixes, representing
that they
were compressed and uuencoded. In case someone types
the .Z.uu
suffix, the basename program strips it from the archive
argument
before assigning it to the ARC variable.
A subshell pipeline constructs the archive using relative
pathnames,
so that when someone extracts from the archive the files
will go to
the current directory. The archive is piped to compress
and
then to uencode, which adds the .Z suffix to the archive's
filename.
Yet Another Problem
At this point, the finished archive would have been
written to a file,
adding the .uu suffix to the archive's name. But uuencode's
"begin" record, placed in the first line to
identify where
uudecode should start its reconstruction, contains the
original
file's permissions. When uuencode reads the file directly,
it takes those permissions from the file's inode. Coming
from
a pipe, uuencode has no meaningful inode to read, so
it assigns
no permissions: 000! That means uudecode will give no
permissions
to the archive file, so nobody could extract data from
the archive
file.
Adding a simple awk script to the end of mkarc
rewrites the "begin" record using 664 permissions.
All the
other input lines that awk sees will print without change.
awk's result goes to the archive file with a .Z.uu
suffix added.
Listing 2, the dearc script, offers archive extraction
by
naming the archive in the first argument and the destination
directory
in the second argument. dearc also accepts a -t option,
to show the archive's table of contents. After figuring
out which
way to run, dearc strips the .uu suffix from the archive
filename to factor out the ZFILE name, simplifying command
syntax later in the script.
Once the file is uudecoded a trap makes sure the uudecoded
file won't
remain on the system after dearc is finished with it.
If only
a table of contents is needed, zcat delivers the uncompressed
archive to cpio, showing the table, then the ZFILE
is removed. The original file stays around.
If the command line requests a directory other than
the current
(.) directory, mkdir's -p option creates the
full path needed, throwing away any error messages,
including a message
indicating that the directory already exists. If mkdir
couldn't
create the directory, the script announces that and
quits. Otherwise,
it extracts the ZFILE in the named directory and quits,
cleaning
up the ZFILE on the way out.
Wrapping Up RCS
Listing 3 shows qarcs, the wrapper around RCS for the
QA group.
This wrapper script contains 10 smaller scripts for
management of
the archives. By using links to the master script, each
link named
after the smaller scripts, maintenance focuses on one
script. When
you vary existing commands, you simply change one script
and all commands
will get the changes. Adding a new command is a matter
of adding the
code to the master script and adding a link with the
new command's
name.
The script figures out what to do by looking at its
name. This technique
eliminates command-line options in favor of more descriptive
command
names. Give the engineers one or two simple command
names to use and
they'll be happy. More sophisticated project leaders
and the QA manager
might take advantage of more sophisticated options,
so other commands
provide those facilities.
All commands begin with qar, for QA Revision. The rest
of the name identifies the command's action. This naming
scheme avoids
conflict with other commands that might be named the
same. Commands
include:
co -- Check out for modification
get -- Check out for looking only
toc -- Table of contents viewing only
ci -- Check in after modification
log -- Show the revision control log
which -- List the archives checked in
cs -- Control revision system attributes
merge -- Combine two archives
install -- Create the script's links
help -- Print the script's usage message
On different systems, RCS was installed in different
directories.
An RCSPATH variable tries to make the RCS files' locations
uniform. Once GNU RCS was installed in a regular location,
RCSPATH
was simply /usr/local/bin. To find the RCS programs
in the
meantime, a more complex expression constructed RCSPATH
by
finding where the shell thought rcs was. A type
command finds rcs on the path and delivers the information
to awk, which extracts the pathname from type's
output and hands that full pathname to dirname. dirname
strips the rcs's name from the full pathname, leaving
only
the directory name.
QARCSDIR holds the name of the central directory. A
disk array filesystem holding dozens of gigabytes mounted
for archiving
simplified the QA group's needs. All their files would
go in a directory
named RCS, distinct from any other archiving already
allocated
for that drive.
The script doesn't show a help screen until after common
options are
initialized, even though after showing the help screen,
the script
quits. Initializing those options lets the help screen
show the default
values for those options. If command-line options change
any of the
values, the help screen shows their latest settings.
qarcs supports several options of its own. All options
are
uppercase to distinguish them from regular RCS options,
which are
all lowercase. A -P option identifies a new path for
the archives,
in case someone wants to keep a special archive out
of the standard
location. qarcs assumes the current directory is the
place
to work with for creating an archive from a tree or
extracting a tree
from an archive. A -D option lets the user define a
different
destination directory that holds the tree. Using -X
turns
on shell debugging, in case anyone wants to see that
mess, and -H
gets help from any qarcs command, acting like qarhelp.
When -H is found, the help screen is shown and the
program
immediately quits. If -H comes last in the command-line
option
set, other options will have already set their special
values, so
the help screen can show some of those other options'
effects. All
other hyphenated command-line options are collected
in an OPT
variable. OPT gets passed with the RCS command so that
regular
RCS options can appear on qarcs's command line.
Special Commands
All but two qarcs commands set up prefix commands, suffix
commands, and special options, then let a general command
handler
do all the work. The two exceptions are qarinstall,
which
creates the script links and the archive directory,
and qarwhich,
which lists the names currently in the archive directory.
qarinstall loops through all the command names except
"qarinstall"
to create the symbolic links to the master script. Thus,
qarinstall
must be the name of the initial script file to install
the rest
of the script names. Links are created in the current
directory, allowing
installation in special directories possibly different
from a generally
accessible directory on everybody's path. When the QARCSDIR
is created, it is given the appropriate permissions
and the qarinstall
script quits.
qarwhich checks that the QARCSDIR is present. If so,
it takes a simple ASCII-sorted listing, selects only
the RCS files
containing a ",v" in the filename, but then
strips the ",v"
to list them. Users shouldn't have to type the ",v"
when referring
to archive filenames, so the list doesn't show it to
them, thus preventing
confusion.
General Commands
A simple command loop factors out the common operation
of all the
scripts:
1. Precede the RCS command with some special preparation.
2. Execute the RCS command.
3. Follow the RCS command with some special action or
cleanup
work.
Program name evaluation figures out which script was
invoked,
sets appropriate option variables and commands, then
executes the
script.
Several commands use a funny-looking syntax:
`basename $f .Z.uu`.Z.uu
This covers the case where someone types the .Z.uu suffix
in an archive name. qarcs archives always have that
suffix,
but users aren't required to type it. If they don't
type it, qarcs
commands add it, but if it is already there, this basename
command strips it out. Once the filename is regular,
the .Z.uu
suffix can be safely attached.
Check Out
RCS's co command checks out an archive for modification
by the QA manager or one of the project leaders. However,
engineers
should be able to check out an unmodifiable copy of
an archive to
do their work. Project leaders occasionally need to
examine an unmodifiable
copy while another leader or the manager is making a
new version from
an existing archive.
The co command handles checking out of
both modifiable
and unmodifiable archives by using a -l (lock) option
to create
modifiable checkouts. Once the lock is set, nobody can
check out another
modifiable copy, but anyone may check out an unmodifiable
copy. So,
the only difference between the qarco and qarget
commands is the mandatory option, -l,
included with qarco.
Locking really prevents checking in a file by anyone
other than the
person who checked out the locked copy. Anyone may check
out a read-only
copy whether a read-write copy was checked out or not.
When RCS's co command finishes checking out the archive,
the
script executes dearc to extract the compressed, uuencoded
archive and rebuild the directory tree. When finished,
the archive
extracted from RCS's control is destroyed. Destroying
a file checked
out from revision control is not usually a good idea,
but later on
the entire archive will be reconstructed by mkarc, so
it isn't
a real loss.
Table of Contents
Because dearc has its own table of contents facility,
qartoc
simply checks out an unmodifiable copy of the archive.
The suffix
command simply uses dearc's -t option, then removes
the checked
out copy. Temporarily, the compressed, uuencoded archive
takes up
space, but that doesn't last very long.
Check In
Checking in requires that mkarc execute to create the
archive before executing RCS's ci command. No special
options
accompany ci, but the user could deliver RCS options
on the
script's command line to get special actions.
Log Printing
Generally, RCS commands execute quietly because most
people using
these scripts won't know or care about what RCS tells
them. A QUIET
variable is set to -q for all qarcs scripts, but qarlog,
executing RCS's rlog on the HP system's version of RCS,
didn't
like that option. Canceling it for qarlog was simpler
than
trying to detect which system RCS was running on.
Controlling and Merging
Both rcs and rcsmerge commands are simple because
they use other RCS options. I expected the QA manager
would use qarcs
to do occasional housekeeping work, such as unlocking
a revision when
someone decided not to continue working with a modifiable
checkout
archive. Because of the unusual nature of the archive
files --
containing entire directory trees, which is in conflict
with the original
purpose of revision control software -- I didn't expect
anyone
would ever want to merge archive modifications. I put
in the qarmerge
command just in case I was wrong.
Conclusion
qarcs made everybody happy. The system support people
were happy because the archives were between 10 and
20 percent of
their original sizes. Disk space crunches were postponed
and manageable.
The QA manager was happy because he could keep as many
as 10 archives
in the space previously used by only one, so he had
more historical
information than he'd ever had before. Without this
historical information
online, he had had to go back to old tapes when special
revisions
were needed and reconstruct his group's work. Using
RCS's revision
comment storage and qartoc's table of contents, he
could know exactly what he needed quickly. He and the
project leaders
were happy because they could make changes without worrying
about
who might clobber their work. Engineers were happy because
they only
needed to remember one command, qarget, and the name
of their archive. They named archives based on the chip
name and revision
numbers, but if ever they weren't sure of an archive
name, they needed
only to remember one other command: qarwhich. Command
wrappers are your friends.
About the Author
Larry Reznick has been programming professionally since
1978.
He is currently working on systems programming in UNIX,
MS-DOS, and
OS/2. He teaches C, C++, and UNIX language courses at
American River
College and at the University of California, Davis extension.
|