Copying, moving, and maintaining files are basic functions
every computer
user, but are particularly critical for system administrators.
Common
though these functions may be, they are not error proof
(how many
times have you accidentally overwritten a file, then
had to haul out
your backup tapes?). This article elaborates on a number
of file management
tools and techniques I've learned from the "school
of hard knocks,"
techniques that help you deal with files in an organized
way. Specifically,
this article will address:
managing distributions
New Features in Common Tools
The GNU fileutils packages present useful enhancements
to
a number of tools (especially mv, cp and ln).
The most current copy is fileutils-3.12.tar.gz, available
on any gnu archive (the main one is on prep.ai.mit.edu
in /pub/gnu). In addition to file moving utilities,
the package
has enhanced copies of chmod, chown, dd, df,
du, mknod, touch, rm, and rmdir,
as well as a highly enhanced and useful version of ls
(with
its plethora of options).
GNU cp and mv have uniform features accessed via the
following options:
backup -- Save previous data under a unique
name, don't clobber it.
long -- Use the long word form of the option.
Most options have a simple one-letter description and
a corresponding
long word. They usually (but not always) occur in matched
pairs. For
example:
--f, --force -- remove existing destinations,
never prompt
--i, --interactive -- prompt before overwrite
The long options help shell scripts become
self-documenting.
help -- Display a mini man page online.
This is much quicker than doing a man. The discussion
of the
long option above is an example.
verbose -- Echo commands as they are executed.
interactive -- Ask for confirmation before
executing destructive commands.
update -- Perform the operation only if
the source is newer than the destination.
provide version -- Display a version interactively.
This is useful when multiple versions are installed
on various places
of a system.
The update and backup facilities are new to these
utilities (I've never seen them on other versions of
UNIX), and they
are very handy.
Backups and Version Control
In these utilities, version control amounts to the ability
to make
copies before doing anything destructive. This means
that when you
are confident you've done the right thing, you can remove
the files
by hand, which is much easier than finding you did something
wrong
and having to search through your backup tapes (and
if you don't have
recent enough backups, the file is lost). cp, ln,
and mv offer version control via the following options:
numbered -- If the file is called foo,
rename the old file to foo.~#~ where # is
the newest number. Examples are: foo.~1~, bar.~23~.
simple -- Don't bother with a number, just
add an extension. The default is ~. If the backup file
previously
exists, it is overwritten.
existing -- Make numbered backups of existing
numbered backups; otherwise, use simple.
You set these options through the environment variable
VERSION_CONTROL.
However, you can override the environment setting by
specifying the
option as the command line. cp, ln, and mv
also have a backup option (triggered by -b or -backup).
The backup option lets you save the old copy, rather
than overwrite
it.
The simple bash session below shows what backups can
do:
leisner@gnu$ ls
leisner@gnu$ >foo
leisner@gnu$ ls
foo
leisner@gnu$ cp -b foo bar; ls
bar foo
leisner@gnu$ cp -b foo bar; ls
bar bar.~1~ foo
leisner@gnu$ cp -b foo bar; ls
bar bar.~1~ bar.~2~ foo
The options used here cause the backups to be automatically
incremented.
Then when you're ready, you can delete them ( rm *~).
This
backup scheme works somewhat like the Emacs backup (though
Emacs appears
to be more flexible).
You can also choose to do simple (non-numbered) backups,
by specifying
the extension with the -S option. For example, to use
.old
as the suffix:
leisner@gnu$ ls
bar foo
leisner@gnu$ cp -b -V simple -S .old foo bar
leisner@gnu$ ls
bar bar.old foo
As noted above, these strategies will be familiar to
people who have used the backup capability on Emacs.
The update option allows you to do use the cp command
in a unique way (along with backups)., as in Figure 1.
The update
option won't copy foo over bar (since bar
is newer than foo), but can copy bar over foo.
The copy operation in Figure 1 produces a numbered backup
instead
of destruction (foo.~1~ was the old foo). When you
use cp interactively, you typically won't use the long
options;
instead you would abbreviate the commands in Figure
1 to:
cp -ubv -V t
As mentioned above, instead of specifying the type of
backup you want each time, you can set the type in the
environment
variable VERSION_CONTROL, as, for example:
export VERSION_CONTROL=numbered
and alias in your shell:
cp to 'cp -b'
mv to 'mv -b'
ln to 'ln -b'
Setting the backup type in the environment variable
means you must
make a conscious decision not to produce backups (this
is preferable
to having to make a conscious decision to do backups,
because you
normally discover you needed a backup when it's too
late). In bash,
I enter:
\cp foo /dev/null
to get the unaliased version (which doesn't do backups).
The environment variable VERSION_CONTROL is useful with
other
programs (when a program supports this feature and you
don't know
about it, you get a pleasant surprise). Two other tools
I've stumbled
across which support this strategy are GNU patch and
GNU indent.
You may also want to define in your environment:
CP='cp -b'
MV='mv -b'
LN='mv -b'
Many configuration scripts will pull these out of your
environment instead of using a default. Then, when you
automatically
generate a makefile, it will be correct. I have found
this useful,
since it means I don't have to think about creating
backups. I only
need to worry about backups when I want to eliminate
file clutter
or recover some space.
I use cp in place of the install program. While install
allows permissions and ownership to be changed and files
to be set
setuid/setgid, I'd rather execute the command manually
as
root if I need to. Using cp lets me control what is
happening
to my system -- if something goes wrong, it's much easier
to recover.
Using GNU cp, in most makefiles I write
INSTALL=cp -b -u
which updates and does a backup (using VERSION_CONTROL
in the environment for the default). However, this approach
will not
work if you use the install options for changing ownerships
and permissions. (I've modified install to accept a
-b
option; it should be available in a future release.)
In normal mode cp and mv are destructive -- if
a destination file already exists, it will be blindly
overwritten
(unless the file cannot be written by the user). Using
non-destructive
backups means you won't make many mistakes you'll be
sorry for. You
can alias cp and mv to include the backup option.
Making Multiple Symbolic Links
The syntax of the ln command allows you to make only
one symbolic
link at a time. But what if you want to make lots of
symbolic links?
One way to do it in sh is:
cd <destination>
source=<path of source>
for i in <file1> <file2> <file3>
do
ln -s $source/$i
done
An easier way to do it, with GNU cp, is:
cd <destination>
cp -s <source>/{file1,file2,file3}
or some variation on this, using the -s (make
symbolic links) option. Note that the source can be
a relative or
an absolute path, and can span file systems. If you
want to copy all
files from $HOME/working, you can use:
cp -sv ~/working/* .
and see each link being made. The file arguments (the
source) must be an absolute or a relative path name
from the destination.
I often find it easy to just go to the place I want
to copy from and
enter:
cp -s `pwd`/* ~/new-place
Distributions often have lots of files and subdirectories,
arranged
in a tree. Instead of physically copying the tree, you
may prefer
to make links to it. There are good reasons for this
approach:
provide more verbose output.
Listing 1 is a bash script (lndir.sh) which
duplicates the functionality of lndir but adds flexibility.
Also included here are the man pages (Listing 2) for
lndir.sh.
Dealing with Distributions
Many programs consist of large distributions that require
placement
of binaries, miscellaneous files, info files, and man
pages in specific
places in a file system hierarchy (the tree). A directory
plan that
accommodates most such requirements looks like this:
bin -- for binaries
lib -- for libraries and auxiliary files
man -- for manpages
info -- for TEXinfo info pages
src -- for source code
html -- for hypertext files
dist -- for distribution files
etc -- for other auxiliary files (could be in lib)
Using a standard tree simplifies figuring out where
to
put files and understanding the system.
When you install a new package of software, it is a
good idea not
to immediately discard the old software: what if the
new version doesn't
work? However, the normal procedure is to overwrite
the old file with
the new file, which means that if something goes wrong
and you need
to restore the older version, you have to resort to
your backup tape.
One way around this is to bypass the standard install
procedures and
install by hand. I often leave several binaries of a
commonly used
program online, appending a version number onto each
(e.g., gdb-4.9,
gdb-4.12, gdb-4.13). A symbolic link from
the file gdb will point to the commonly used version
(e.g.,
gdb points to gdb-4.12). If you are adventurous,
you might do an alias for gdb (e.g., alias gdb to
gdb-4.13), you might have a symbolic link in a personal
bin directory to the version being used (e.g., ~/bin/gdb
to /usr/gnu/bin/gdb-4.13).
These are very useful strategies since they help ensure
that important
tools don't disappear. I keep several versions of common
tools available
on line. If the most current one doesn't work, I try
another one.
I also like to leave my newest binaries, with symbols
and source code,
online, so that I can easily debug the program. However,
I don't need
to leave all binaries in this state, so I run strip
on older
binaries I don't care to debug. I compress binaries
that aren't used
often. This is easier than using backups: if something
is online,
it takes just a moment to uncompress it.
Another useful trick is to make a different directory
for each substantial
package (it's up to you to define what "substantial"
means).
For my purposes, a commonly used package with a dozen
programs and
man pages would count as substantional. I put each
package
in its own directory with a version number. Thus, when
I configured
GNU fileutils-3.12, I did:
mkdir /usr/gnu/fileutils-3.12
./configure --prefix=/usr/gnu/fileutils-3.12
Then I made and ran a
make; make install
to put the programs and documentation in /usr/gnu/fileutils-3.12.
I have another copy of fileutils as /usr/gnu/fileutils-3.9.
Most users don't have to know about the version number,
so /usr/gnu/fileutils
is a link to the version being used. Going one step
further, it isn't
even necessary to know about /usr/gnu/fileutils; in
/usr/gnu/bin
you can make symbolic links to all the programs:
leisner@gemini$ ls -l /usr/gnu/bin/{ls,ln,cp,mv}
lrwxrwxrwx 1 root daemon 19 Jun 5 1994 /usr/gnu/bin/cp -> ../fileutils/bin/cp*
lrwxrwxrwx 1 root daemon 19 Jun 5 1994 /usr/gnu/bin/ln -> ../fileutils/bin/ln*
lrwxrwxrwx 1 root daemon 19 Jun 5 1994 /usr/gnu/bin/ls -> ../fileutils/bin/ls*
lrwxrwxrwx 1 root daemon 19 Jun 5 1994 /usr/gnu/bin/mv -> ../fileutils/bin/mv*
and the man pages:
leisner@gemini$ ls -l /usr/gnu/man/man1/{ls,ln,cp,mv}.1
lrwxrwxrwx 1 leisner sdsp 29 Dec 9 12:39 /usr/gnu/man/man1/cp.1 -> ../../fileutils/man/man1/cp.1
lrwxrwxrwx 1 leisner sdsp 29 Dec 9 12:39 /usr/gnu/man/man1/ln.1 -> ../../fileutils/man/man1/ln.1
lrwxrwxrwx 1 leisner sdsp 29 Dec 9 12:39 /usr/gnu/man/man1/ls.1 -> ../../fileutils/man/man1/ls.1
lrwxrwxrwx 1 leisner sdsp 29 Dec 9 12:39 /usr/gnu/man/man1/mv.1 -> ../../fileutils/man/man1/mv.1
This strategy makes it easy to upgrade packages (since
../fileutils is another symlink) and reduces the burden
on
users, who just need to know about single paths (/usr/gnu/bin,
/usr/gnu/man). Only the system administrators need to
know
about the physical architecture.
Figuring Out Your Paths
Now, instead of a catchall for every program you install,
you have
much more order and control (and more files to keep
track of). How
do you manage your path? A good way is to let your login
shell figure
this out for you. There are three PATHs which are important:
PATH -- for executables
MANPATH -- for manpages
INFOPATH -- for info files.
Listing 3 and Listing 4 show examples
for
csh and bash (and
probably other Bourne-like shells as well). The idea
is to have an
array of possible paths. Each path may have a component
with bin
(for PATH), man (for MANPATH)
and/or info (for INFOPATH). This lets you
look and see which components are there and add them
to your paths
as appropriate. This is far better than the "kitchen
sink"
approach (if I define everything imaginable, eventually
I'll hit what
I'm looking for). This approach lets me use common login
files on
different domains -- my paths autoconfigure according
to which
components are present.
Remember to define the environment variable once when
you log in.
If you define it for each invocation of the shell, you
will slow the
system down (path hashing is expensive) and create confusion
(if you
change an environment variable and spawn another shell,
you'll get
back the old environment variable).
In the C shell example in Listing 3, notice that I set
the PATH
in one operation at the end. I discovered that each
time you set the
PATH, the shell does a search of the path, and you are
better
off doing this only one time.
About the Author
Marty Leisner has been programming in C and UNIX for
a dozen
years. You can reach him at leisner@sdsp.mc.xerox.com.