Automating Symbolic Link Generation
 
Clive King and Adrian Rixon 
Introduction  
Users should need no more than minimal knowledge about
the structure 
of the network they use. The layout of the filesystem
is only part 
of a user's environment, but it is fundamental to their
perception 
of the network. Within a heterogeneous network, the
filesystem should 
ideally appear the same to a user from all machines
on a network and 
they should not need to understand the underlying network
structure 
required to present referential transparency. 
This article addresses the problem of providing referential
transparency 
(making a filesystem look the same from every location
on a network) 
as regards system change and suggests a solution where
symbolic links 
are used in combination with an automounter to give
the user a consistent 
view of filesystems across the network. 
With the large numbers of symbolic links required, it
is clearly not 
feasible to generate all required symbolic links by
hand. Our solution 
is a generator for symbolic links that takes a high-level
description 
of the links to be constructed. 
Because reference through multiple levels of symbolic
links can have 
a significant impact on performance, we also present
a program for 
identifying potential problem areas. [Editor's note:
The code for 
these tools is too large to be published here but is
available electronically. 
See the "Online Source Code" entry in the
table of contents for a 
pointer to availability information.] 
The Virtues of a Consistent Filesystem Structure 
A consistent filesystem structure over time helps the
administrator 
avoid many problems. For example, users need to be able
to set an 
appropriate search path for the shell so that they can
access the 
software that they use. If the location of an item of
software changes, 
not only will the shell's path require changing, but
also any shell 
scripts that have the path set explicitly and any binaries
that invoke 
commands via the shell using system or popen. 
Applications, libraries, and initialization files must
be in the correct 
locations, with the correct permissions, and must be
up to date in 
relation to the software that uses them. If the location
of a library 
is changed, then some applications may stop working
correctly for 
no apparent reason, especially if they are used infrequently.
Recompiling 
applications may not be possible as the sources or the
time may not 
be available. 
X applications, for example, can find their defaults
file either by 
using the default search directory which is set at compile
time or 
the environment variable XAPPLRESDIR. If the location
of an 
application's defaults file is changed, then the default
search path 
can be altered by setting XFILESEARCHPATH or XAPPLRESDIR
on a per-user basis. The alternative is recompilation
of the X server. 
Both alternatives are inconvenient, and cause extra
work and confusion. 
In the past, some applications have needed to be installed
in a specific 
location. Although current versions of such software
appear to have 
addressed this problem, older versions are still in
use and need access 
to the resources they use. (Does this indicate that
the authors of 
such software were never systems administrators?) 
A Solution Strategy 
At Aberystwyth, the amd automounter has formed a central
part 
of our filesystem management strategy and is used in
place of the 
Sun automounter. All amd maps are distributed via NIS,
which 
allows centralized control of the maps and effective
distribution 
across the network. 
Referential transparency requires the implementation
of a naming scheme 
that is independent of the physical location or type
of any item of 
the hardware, the type or release of operating system,
and the role 
of the machine.  For established networks however, it
is not acceptable 
to change the whole structure of a filesystem overnight
so as to adhere 
to a new naming scheme. While a new structure is being
put in place, 
the safe option is to mimic the old filesystem. In our
case,  a significant 
body of applications still access libraries or initialization
files 
from a filesystem naming scheme that started to be phased
out two 
years ago. We addressed this problem by a combination
of symbolic 
links and the link type of map entry under an automounter. 
This approach allows a new naming scheme to be adopted
or the change 
of use of individual disk partitions -- only the automounter
maps 
require alteration. We use a mount type of link to simulate
a symbolic 
link of the old filesystem name in addition to a direct
mount for 
the new name. 
A large volume of nonsystem software is now used at
many sites. The 
management and updating of this body of software is
a significant 
task and is easier to manage if split into distinct
sections. At Aberystwyth 
we have chosen the following division of non-system
software 
TeX -- TeX, LaTeX and associated 
software 
gnu -- Free Software Foundation 
X11R5 -- X software 
lang -- Language software such as ADA, POPLOG 
and ML 
msdos -- MS-DOS applications for mounting 
by PCs via PCNFS 
misc -- Software that does not fit into 
other categories  
Each software division has its own disk partition, except
TeX 
and X11R5, which share a disk partition. The Computer
Science department 
has a class C network for teaching and for each of two
research groups. 
TeX and X11R5 are considered basic software and so are
duplicated 
on a server on each of the networks. This is done in
part for resiliency 
and in part to reduce NFS traffic over routers between
the networks. 
The networks each have different LANG and MISC partitions
that reflect 
the different requirements of the groups that use the
machines on 
each subnet. The MSDOS and GNU partitions exist on only
one server. 
All files within each of the subdirectories of the above
software 
categories have a symbolic link to the equivalent directory
in /usr/local. 
This means that all non-system software is accessible
via /usr/local/bin. 
All non-system libraries can be set up to be accessible
via /usr/local/lib 
and all non-system manual pages can be found in a subdirectory
of 
/usr/local/man. This policy allows users to cut down
the number 
of items in their path and eliminates the need to change
their paths 
when a new class of software is added; this responsibility
is transferred 
to the administrator. 
The form of a user's path name is historical and cannot
be changed 
overnight. To provide a shorter route to home directories,
and in 
keeping with history, a symbolic link is made from each
user's home 
directory to a link of the same name in /home_link. 
We deliberately made all solutions as simple as practical
so that 
new staff would not face too great a learning curve
when confronted 
with the site-specific aspects of the network setup. 
Generating Symbolic Links 
The large number of symbolic links required to provide
referential 
transparency makes it infeasible to generate all the
links by hand. 
For example, one of our servers had 1176 symbolic links
in just /home_link 
and /usr/local/bin. With over 100 UNIX machines on our
network, 
an automated method was clearly required. 
To document and simplify the creation of symbolic links
within the 
filesystem for local software and to provide a shorter
route to a 
user's home directory, we built a program that takes
a description 
of the form of symbolic links required and makes appropriate
symbolic 
links according to the description given. The links
are made between 
a target and a source file or directory. The target
is the file to 
which the new link is to be created. The source is the
file or directory 
which is to be linked. For example, all executables
in source /usr/local/gnu/bin 
are linked to a target of the same name in /usr/local/bin. 
 
ls -l /usr/local/bin/gcc
lrwxrwxrwx 1 root 22 Nov 7 1992 \
/usr/local/bin/gcc -> /usr/local/gnu/bin/gcc  
 
The syntax is of the form: 
 
ROOT==   root_path ;
search-depth== a value ;
type [ source -> target  ]; 
 
ROOT is the source directory, such as /usr/local/TeX
or /usr/local/X11R5. The link type can be one of a number
of potential classes of symbolic link structures: 
 Single links -- Makes one-to-one links directly 
from the source file to the target file.
 Multiple links -- Makes links from all the files 
of which source is the parent directory, to the subdirectory
of the 
target directory. 
 Sub Multiple links -- Makes links from all the files
in the subdirectories of the source to files in newly
created directories 
in the target directory, e.g., Man directory. Subdirectories
of the target are made as required.
The depth to which the links will be followed can be
set by using 
search-depth. search-depth only applies to the sub_multi_link
class and has a default value of 20. A depth of 0 will
make links 
of only the files and immediate subdirectories of the
source directory. 
A depth of 1 will make links of all files and directories
in the immediate 
subdirectories. To make links to all directories and
files, without 
consideration of the depth of the directory tree, search
depth should 
be set to an arbitrarily large value such as 20. For
example: 
 
ROOT==/usr/local/gnu;
search-depth==0;
multi_link[ bin -> /usr/local/bin ];
multi_link [ lib -> /usr/local/lib ];
sub_multi_link [ man -> /usr/local/man ];
ROOT==/home/support/part_a
search-depth==0;
multi_link[ mnt -> /home_link ];  
 
The link generator takes the following command-line
flags 
in addition to the file describing the links to be made: 
-l -- Makes a log of the links made in linker.log 
-t -- Test mode, does not make symbolic 
links, but goes through the motions  
-a -- Requires an acknowledgment before 
it makes a link 
-b -- If the link exists, then carry on 
to the next, makes a symbolic link only if a link does
not already 
exist 
-I -- Ignore and overwrite old links, no 
acknowledgment 
-h -- syntax help  
The symbolic link generator is written in C and uses
Lex and 
Yacc  to interpret the specification files. 
If you add new software to an established setup, you
must create further 
links. You can do this either when installing the software
or by running 
the link generator with the "-b" option at
regular intervals. 
Performance Considerations 
When designing the layout of the filesystem you must
make sure that 
symbolic links are only used on the client to access
the file in question 
and that a mount is never attempted on a symbolic link.
Direct mounts 
should be used to mount all filesystems, and so avoid
excessive resolution 
of the symbolic links that are most inefficient. Therefore
a symbolic 
link should only be used to directly access a file on
the client and 
should never be placed on the server side of a mount.
Multiple levels 
of symbolic links should not exist. 
NFS traffic over routers should be reduced as far as
practical. This 
can be achieved by providing all commonly used software
in a mirrored 
partition from a server on each network. 
Checking for Multiple Links 
Within the link generator the function resolve_link
checks 
for multiple symbolic links. It attempts to make a link
direct to 
the file itself if the target path is composed of a
series of symbolic 
links. The compile time #define NO_LINK_RESOLVE can
be used 
to turn this feature off. 
Also available is the program link_test, which can be
used 
to analyze a filesystem in regard to the number of symbolic
links 
and cross mounts. It resolves symbolic links and displays
the number 
of links and mount points, where the links exist, and
if a link exists 
on the wrong side of a mount point. 
The "-f" option to link_test will analyse
the path 
to every file that is in the argument directory. 
link_test ignores any mount on / and can illuminate
problems 
such as 
 
thor:cmk>link_test  ~cmk
----------------------------------------------
Following path "/home/support/part_a/mnt/cmk"
"support"  mounted on mount point "/home"
"mnt" is a symbolic link
"mnt" is a link on the wrong side of a mount
in path "/home/support/part_a/mnt"
"mnt" linked to directory "/a/athene/4.2/xy1e" -- now
following "/a/athene/4.2/xy1e"
"xy1e"  mounted on mount point "/a/athene/4.2"
Path "/home/support/part_a/mnt/cmk" has :-
1 symbolic links
2 cross mounts  
 
Conclusions 
The method of managing filesystem structure presented
here can be 
used to provide referential transparency across a heterogeneous
network 
of UNIX machines. It allows the administrator to make
the changes 
necessary for effective administration, while minimizing
both the 
change that the user has to cope with and the possibility
that an 
application might stop working for reasons associated
with these changes. 
The AMD automounter with the automounter maps distributed
via NIS 
provides a centralized mechanism for the management
of filesystem 
change. The physical location of a disk partition is
transparent to 
the user; therefore, the underlying structure of the
mounts can be 
changed so long as the structure of the filesystem appears
the same 
to the user. 
This article has presented an automated approach to
the generation 
of symbolic links and a simple language for the description
of the 
links. The symbolic link generator greatly simplifies
the generation 
of symbolic links on a network of machines. 
Symbolic links can have a significant performance hit;
the electronic 
code distribution includes a short program to survey
the filesystem 
and point out potential problems. In addition, the symbolic
link generator 
attempts to reduce successive levels of symbolic links
in order to 
make a link only to a regular file or directory. 
References 
Pendry, J. 1990. Amd Automounter Manual. 
Sun Microsystems. December 1990.  "Networks and
File Servers : A Performance Tuning Guide." Technical
Report.  
The mailing list amd-workers@gov.lanl.acl 
appears to be the best method of getting up to date
information on 
the status of amd.  
 
 About the Authors
 
Clive King is a member of the Centre for Intelligent
Systems 
within the Department of Computer Science in the University
of Wales, 
Aberystwyth. His duties include teaching UNIX systems
programming, 
supporting research, and administering a network of
UNIX workstations. 
Prior to this, he was a Systems Administrator at the
Ordnance Survey, 
the national Mapping agency in the U.K. His research
interests are 
in the area of distributed systems and concurrent programming. 
Adrian Rixon is a Systems Administrator within the Department
of Computer Science in the University of Wales, Aberystwyth.
His research 
interests are in the areas of information discovery
and retrieval 
in Wide Area Networks and network security.  
 
   
  |