Automating Symbolic Link Generation
Clive King and Adrian Rixon
Introduction
Users should need no more than minimal knowledge about
the structure
of the network they use. The layout of the filesystem
is only part
of a user's environment, but it is fundamental to their
perception
of the network. Within a heterogeneous network, the
filesystem should
ideally appear the same to a user from all machines
on a network and
they should not need to understand the underlying network
structure
required to present referential transparency.
This article addresses the problem of providing referential
transparency
(making a filesystem look the same from every location
on a network)
as regards system change and suggests a solution where
symbolic links
are used in combination with an automounter to give
the user a consistent
view of filesystems across the network.
With the large numbers of symbolic links required, it
is clearly not
feasible to generate all required symbolic links by
hand. Our solution
is a generator for symbolic links that takes a high-level
description
of the links to be constructed.
Because reference through multiple levels of symbolic
links can have
a significant impact on performance, we also present
a program for
identifying potential problem areas. [Editor's note:
The code for
these tools is too large to be published here but is
available electronically.
See the "Online Source Code" entry in the
table of contents for a
pointer to availability information.]
The Virtues of a Consistent Filesystem Structure
A consistent filesystem structure over time helps the
administrator
avoid many problems. For example, users need to be able
to set an
appropriate search path for the shell so that they can
access the
software that they use. If the location of an item of
software changes,
not only will the shell's path require changing, but
also any shell
scripts that have the path set explicitly and any binaries
that invoke
commands via the shell using system or popen.
Applications, libraries, and initialization files must
be in the correct
locations, with the correct permissions, and must be
up to date in
relation to the software that uses them. If the location
of a library
is changed, then some applications may stop working
correctly for
no apparent reason, especially if they are used infrequently.
Recompiling
applications may not be possible as the sources or the
time may not
be available.
X applications, for example, can find their defaults
file either by
using the default search directory which is set at compile
time or
the environment variable XAPPLRESDIR. If the location
of an
application's defaults file is changed, then the default
search path
can be altered by setting XFILESEARCHPATH or XAPPLRESDIR
on a per-user basis. The alternative is recompilation
of the X server.
Both alternatives are inconvenient, and cause extra
work and confusion.
In the past, some applications have needed to be installed
in a specific
location. Although current versions of such software
appear to have
addressed this problem, older versions are still in
use and need access
to the resources they use. (Does this indicate that
the authors of
such software were never systems administrators?)
A Solution Strategy
At Aberystwyth, the amd automounter has formed a central
part
of our filesystem management strategy and is used in
place of the
Sun automounter. All amd maps are distributed via NIS,
which
allows centralized control of the maps and effective
distribution
across the network.
Referential transparency requires the implementation
of a naming scheme
that is independent of the physical location or type
of any item of
the hardware, the type or release of operating system,
and the role
of the machine. For established networks however, it
is not acceptable
to change the whole structure of a filesystem overnight
so as to adhere
to a new naming scheme. While a new structure is being
put in place,
the safe option is to mimic the old filesystem. In our
case, a significant
body of applications still access libraries or initialization
files
from a filesystem naming scheme that started to be phased
out two
years ago. We addressed this problem by a combination
of symbolic
links and the link type of map entry under an automounter.
This approach allows a new naming scheme to be adopted
or the change
of use of individual disk partitions -- only the automounter
maps
require alteration. We use a mount type of link to simulate
a symbolic
link of the old filesystem name in addition to a direct
mount for
the new name.
A large volume of nonsystem software is now used at
many sites. The
management and updating of this body of software is
a significant
task and is easier to manage if split into distinct
sections. At Aberystwyth
we have chosen the following division of non-system
software
TeX -- TeX, LaTeX and associated
software
gnu -- Free Software Foundation
X11R5 -- X software
lang -- Language software such as ADA, POPLOG
and ML
msdos -- MS-DOS applications for mounting
by PCs via PCNFS
misc -- Software that does not fit into
other categories
Each software division has its own disk partition, except
TeX
and X11R5, which share a disk partition. The Computer
Science department
has a class C network for teaching and for each of two
research groups.
TeX and X11R5 are considered basic software and so are
duplicated
on a server on each of the networks. This is done in
part for resiliency
and in part to reduce NFS traffic over routers between
the networks.
The networks each have different LANG and MISC partitions
that reflect
the different requirements of the groups that use the
machines on
each subnet. The MSDOS and GNU partitions exist on only
one server.
All files within each of the subdirectories of the above
software
categories have a symbolic link to the equivalent directory
in /usr/local.
This means that all non-system software is accessible
via /usr/local/bin.
All non-system libraries can be set up to be accessible
via /usr/local/lib
and all non-system manual pages can be found in a subdirectory
of
/usr/local/man. This policy allows users to cut down
the number
of items in their path and eliminates the need to change
their paths
when a new class of software is added; this responsibility
is transferred
to the administrator.
The form of a user's path name is historical and cannot
be changed
overnight. To provide a shorter route to home directories,
and in
keeping with history, a symbolic link is made from each
user's home
directory to a link of the same name in /home_link.
We deliberately made all solutions as simple as practical
so that
new staff would not face too great a learning curve
when confronted
with the site-specific aspects of the network setup.
Generating Symbolic Links
The large number of symbolic links required to provide
referential
transparency makes it infeasible to generate all the
links by hand.
For example, one of our servers had 1176 symbolic links
in just /home_link
and /usr/local/bin. With over 100 UNIX machines on our
network,
an automated method was clearly required.
To document and simplify the creation of symbolic links
within the
filesystem for local software and to provide a shorter
route to a
user's home directory, we built a program that takes
a description
of the form of symbolic links required and makes appropriate
symbolic
links according to the description given. The links
are made between
a target and a source file or directory. The target
is the file to
which the new link is to be created. The source is the
file or directory
which is to be linked. For example, all executables
in source /usr/local/gnu/bin
are linked to a target of the same name in /usr/local/bin.
ls -l /usr/local/bin/gcc
lrwxrwxrwx 1 root 22 Nov 7 1992 \
/usr/local/bin/gcc -> /usr/local/gnu/bin/gcc
The syntax is of the form:
ROOT== root_path ;
search-depth== a value ;
type [ source -> target ];
ROOT is the source directory, such as /usr/local/TeX
or /usr/local/X11R5. The link type can be one of a number
of potential classes of symbolic link structures:
Single links -- Makes one-to-one links directly
from the source file to the target file.
Multiple links -- Makes links from all the files
of which source is the parent directory, to the subdirectory
of the
target directory.
Sub Multiple links -- Makes links from all the files
in the subdirectories of the source to files in newly
created directories
in the target directory, e.g., Man directory. Subdirectories
of the target are made as required.
The depth to which the links will be followed can be
set by using
search-depth. search-depth only applies to the sub_multi_link
class and has a default value of 20. A depth of 0 will
make links
of only the files and immediate subdirectories of the
source directory.
A depth of 1 will make links of all files and directories
in the immediate
subdirectories. To make links to all directories and
files, without
consideration of the depth of the directory tree, search
depth should
be set to an arbitrarily large value such as 20. For
example:
ROOT==/usr/local/gnu;
search-depth==0;
multi_link[ bin -> /usr/local/bin ];
multi_link [ lib -> /usr/local/lib ];
sub_multi_link [ man -> /usr/local/man ];
ROOT==/home/support/part_a
search-depth==0;
multi_link[ mnt -> /home_link ];
The link generator takes the following command-line
flags
in addition to the file describing the links to be made:
-l -- Makes a log of the links made in linker.log
-t -- Test mode, does not make symbolic
links, but goes through the motions
-a -- Requires an acknowledgment before
it makes a link
-b -- If the link exists, then carry on
to the next, makes a symbolic link only if a link does
not already
exist
-I -- Ignore and overwrite old links, no
acknowledgment
-h -- syntax help
The symbolic link generator is written in C and uses
Lex and
Yacc to interpret the specification files.
If you add new software to an established setup, you
must create further
links. You can do this either when installing the software
or by running
the link generator with the "-b" option at
regular intervals.
Performance Considerations
When designing the layout of the filesystem you must
make sure that
symbolic links are only used on the client to access
the file in question
and that a mount is never attempted on a symbolic link.
Direct mounts
should be used to mount all filesystems, and so avoid
excessive resolution
of the symbolic links that are most inefficient. Therefore
a symbolic
link should only be used to directly access a file on
the client and
should never be placed on the server side of a mount.
Multiple levels
of symbolic links should not exist.
NFS traffic over routers should be reduced as far as
practical. This
can be achieved by providing all commonly used software
in a mirrored
partition from a server on each network.
Checking for Multiple Links
Within the link generator the function resolve_link
checks
for multiple symbolic links. It attempts to make a link
direct to
the file itself if the target path is composed of a
series of symbolic
links. The compile time #define NO_LINK_RESOLVE can
be used
to turn this feature off.
Also available is the program link_test, which can be
used
to analyze a filesystem in regard to the number of symbolic
links
and cross mounts. It resolves symbolic links and displays
the number
of links and mount points, where the links exist, and
if a link exists
on the wrong side of a mount point.
The "-f" option to link_test will analyse
the path
to every file that is in the argument directory.
link_test ignores any mount on / and can illuminate
problems
such as
thor:cmk>link_test ~cmk
----------------------------------------------
Following path "/home/support/part_a/mnt/cmk"
"support" mounted on mount point "/home"
"mnt" is a symbolic link
"mnt" is a link on the wrong side of a mount
in path "/home/support/part_a/mnt"
"mnt" linked to directory "/a/athene/4.2/xy1e" -- now
following "/a/athene/4.2/xy1e"
"xy1e" mounted on mount point "/a/athene/4.2"
Path "/home/support/part_a/mnt/cmk" has :-
1 symbolic links
2 cross mounts
Conclusions
The method of managing filesystem structure presented
here can be
used to provide referential transparency across a heterogeneous
network
of UNIX machines. It allows the administrator to make
the changes
necessary for effective administration, while minimizing
both the
change that the user has to cope with and the possibility
that an
application might stop working for reasons associated
with these changes.
The AMD automounter with the automounter maps distributed
via NIS
provides a centralized mechanism for the management
of filesystem
change. The physical location of a disk partition is
transparent to
the user; therefore, the underlying structure of the
mounts can be
changed so long as the structure of the filesystem appears
the same
to the user.
This article has presented an automated approach to
the generation
of symbolic links and a simple language for the description
of the
links. The symbolic link generator greatly simplifies
the generation
of symbolic links on a network of machines.
Symbolic links can have a significant performance hit;
the electronic
code distribution includes a short program to survey
the filesystem
and point out potential problems. In addition, the symbolic
link generator
attempts to reduce successive levels of symbolic links
in order to
make a link only to a regular file or directory.
References
Pendry, J. 1990. Amd Automounter Manual.
Sun Microsystems. December 1990. "Networks and
File Servers : A Performance Tuning Guide." Technical
Report.
The mailing list amd-workers@gov.lanl.acl
appears to be the best method of getting up to date
information on
the status of amd.
About the Authors
Clive King is a member of the Centre for Intelligent
Systems
within the Department of Computer Science in the University
of Wales,
Aberystwyth. His duties include teaching UNIX systems
programming,
supporting research, and administering a network of
UNIX workstations.
Prior to this, he was a Systems Administrator at the
Ordnance Survey,
the national Mapping agency in the U.K. His research
interests are
in the area of distributed systems and concurrent programming.
Adrian Rixon is a Systems Administrator within the Department
of Computer Science in the University of Wales, Aberystwyth.
His research
interests are in the areas of information discovery
and retrieval
in Wide Area Networks and network security.
|