Cover V03, I05
Article

sep94.tar


Automating Symbolic Link Generation

Clive King and Adrian Rixon

Introduction

Users should need no more than minimal knowledge about the structure of the network they use. The layout of the filesystem is only part of a user's environment, but it is fundamental to their perception of the network. Within a heterogeneous network, the filesystem should ideally appear the same to a user from all machines on a network and they should not need to understand the underlying network structure required to present referential transparency.

This article addresses the problem of providing referential transparency (making a filesystem look the same from every location on a network) as regards system change and suggests a solution where symbolic links are used in combination with an automounter to give the user a consistent view of filesystems across the network.

With the large numbers of symbolic links required, it is clearly not feasible to generate all required symbolic links by hand. Our solution is a generator for symbolic links that takes a high-level description of the links to be constructed.

Because reference through multiple levels of symbolic links can have a significant impact on performance, we also present a program for identifying potential problem areas. [Editor's note: The code for these tools is too large to be published here but is available electronically. See the "Online Source Code" entry in the table of contents for a pointer to availability information.]

The Virtues of a Consistent Filesystem Structure

A consistent filesystem structure over time helps the administrator avoid many problems. For example, users need to be able to set an appropriate search path for the shell so that they can access the software that they use. If the location of an item of software changes, not only will the shell's path require changing, but also any shell scripts that have the path set explicitly and any binaries that invoke commands via the shell using system or popen.

Applications, libraries, and initialization files must be in the correct locations, with the correct permissions, and must be up to date in relation to the software that uses them. If the location of a library is changed, then some applications may stop working correctly for no apparent reason, especially if they are used infrequently. Recompiling applications may not be possible as the sources or the time may not be available.

X applications, for example, can find their defaults file either by using the default search directory which is set at compile time or the environment variable XAPPLRESDIR. If the location of an application's defaults file is changed, then the default search path can be altered by setting XFILESEARCHPATH or XAPPLRESDIR on a per-user basis. The alternative is recompilation of the X server. Both alternatives are inconvenient, and cause extra work and confusion.

In the past, some applications have needed to be installed in a specific location. Although current versions of such software appear to have addressed this problem, older versions are still in use and need access to the resources they use. (Does this indicate that the authors of such software were never systems administrators?)

A Solution Strategy

At Aberystwyth, the amd automounter has formed a central part of our filesystem management strategy and is used in place of the Sun automounter. All amd maps are distributed via NIS, which allows centralized control of the maps and effective distribution across the network.

Referential transparency requires the implementation of a naming scheme that is independent of the physical location or type of any item of the hardware, the type or release of operating system, and the role of the machine. For established networks however, it is not acceptable to change the whole structure of a filesystem overnight so as to adhere to a new naming scheme. While a new structure is being put in place, the safe option is to mimic the old filesystem. In our case, a significant body of applications still access libraries or initialization files from a filesystem naming scheme that started to be phased out two years ago. We addressed this problem by a combination of symbolic links and the link type of map entry under an automounter.

This approach allows a new naming scheme to be adopted or the change of use of individual disk partitions -- only the automounter maps require alteration. We use a mount type of link to simulate a symbolic link of the old filesystem name in addition to a direct mount for the new name.

A large volume of nonsystem software is now used at many sites. The management and updating of this body of software is a significant task and is easier to manage if split into distinct sections. At Aberystwyth we have chosen the following division of non-system software

TeX -- TeX, LaTeX and associated software

gnu -- Free Software Foundation

X11R5 -- X software

lang -- Language software such as ADA, POPLOG and ML

msdos -- MS-DOS applications for mounting by PCs via PCNFS

misc -- Software that does not fit into other categories

Each software division has its own disk partition, except TeX and X11R5, which share a disk partition. The Computer Science department has a class C network for teaching and for each of two research groups. TeX and X11R5 are considered basic software and so are duplicated on a server on each of the networks. This is done in part for resiliency and in part to reduce NFS traffic over routers between the networks. The networks each have different LANG and MISC partitions that reflect the different requirements of the groups that use the machines on each subnet. The MSDOS and GNU partitions exist on only one server.

All files within each of the subdirectories of the above software categories have a symbolic link to the equivalent directory in /usr/local. This means that all non-system software is accessible via /usr/local/bin. All non-system libraries can be set up to be accessible via /usr/local/lib and all non-system manual pages can be found in a subdirectory of /usr/local/man. This policy allows users to cut down the number of items in their path and eliminates the need to change their paths when a new class of software is added; this responsibility is transferred to the administrator.

The form of a user's path name is historical and cannot be changed overnight. To provide a shorter route to home directories, and in keeping with history, a symbolic link is made from each user's home directory to a link of the same name in /home_link.

We deliberately made all solutions as simple as practical so that new staff would not face too great a learning curve when confronted with the site-specific aspects of the network setup.

Generating Symbolic Links

The large number of symbolic links required to provide referential transparency makes it infeasible to generate all the links by hand. For example, one of our servers had 1176 symbolic links in just /home_link and /usr/local/bin. With over 100 UNIX machines on our network, an automated method was clearly required.

To document and simplify the creation of symbolic links within the filesystem for local software and to provide a shorter route to a user's home directory, we built a program that takes a description of the form of symbolic links required and makes appropriate symbolic links according to the description given. The links are made between a target and a source file or directory. The target is the file to which the new link is to be created. The source is the file or directory which is to be linked. For example, all executables in source /usr/local/gnu/bin are linked to a target of the same name in /usr/local/bin.

ls -l /usr/local/bin/gcc
lrwxrwxrwx 1 root 22 Nov 7 1992 \
/usr/local/bin/gcc -> /usr/local/gnu/bin/gcc 

The syntax is of the form:

ROOT==   root_path ;
search-depth== a value ;
type [ source -> target  ];

ROOT is the source directory, such as /usr/local/TeX or /usr/local/X11R5. The link type can be one of a number of potential classes of symbolic link structures:

  • Single links -- Makes one-to-one links directly from the source file to the target file.

  • Multiple links -- Makes links from all the files of which source is the parent directory, to the subdirectory of the target directory.

  • Sub Multiple links -- Makes links from all the files in the subdirectories of the source to files in newly created directories in the target directory, e.g., Man directory. Subdirectories of the target are made as required.

    The depth to which the links will be followed can be set by using search-depth. search-depth only applies to the sub_multi_link class and has a default value of 20. A depth of 0 will make links of only the files and immediate subdirectories of the source directory. A depth of 1 will make links of all files and directories in the immediate subdirectories. To make links to all directories and files, without consideration of the depth of the directory tree, search depth should be set to an arbitrarily large value such as 20. For example:

    ROOT==/usr/local/gnu;
    search-depth==0;
    multi_link[ bin -> /usr/local/bin ];
    multi_link [ lib -> /usr/local/lib ];
    sub_multi_link [ man -> /usr/local/man ];
    
    ROOT==/home/support/part_a
    search-depth==0;
    multi_link[ mnt -> /home_link ]; 

    The link generator takes the following command-line flags in addition to the file describing the links to be made:

    -l -- Makes a log of the links made in linker.log

    -t -- Test mode, does not make symbolic links, but goes through the motions

    -a -- Requires an acknowledgment before it makes a link

    -b -- If the link exists, then carry on to the next, makes a symbolic link only if a link does not already exist

    -I -- Ignore and overwrite old links, no acknowledgment

    -h -- syntax help

    The symbolic link generator is written in C and uses Lex and Yacc to interpret the specification files.

    If you add new software to an established setup, you must create further links. You can do this either when installing the software or by running the link generator with the "-b" option at regular intervals.

    Performance Considerations

    When designing the layout of the filesystem you must make sure that symbolic links are only used on the client to access the file in question and that a mount is never attempted on a symbolic link. Direct mounts should be used to mount all filesystems, and so avoid excessive resolution of the symbolic links that are most inefficient. Therefore a symbolic link should only be used to directly access a file on the client and should never be placed on the server side of a mount. Multiple levels of symbolic links should not exist.

    NFS traffic over routers should be reduced as far as practical. This can be achieved by providing all commonly used software in a mirrored partition from a server on each network.

    Checking for Multiple Links

    Within the link generator the function resolve_link checks for multiple symbolic links. It attempts to make a link direct to the file itself if the target path is composed of a series of symbolic links. The compile time #define NO_LINK_RESOLVE can be used to turn this feature off.

    Also available is the program link_test, which can be used to analyze a filesystem in regard to the number of symbolic links and cross mounts. It resolves symbolic links and displays the number of links and mount points, where the links exist, and if a link exists on the wrong side of a mount point.

    The "-f" option to link_test will analyse the path to every file that is in the argument directory.

    link_test ignores any mount on / and can illuminate problems such as

    thor:cmk>link_test  ~cmk
    ----------------------------------------------
    
    Following path "/home/support/part_a/mnt/cmk"
    
    "support"  mounted on mount point "/home"
    "mnt" is a symbolic link
    "mnt" is a link on the wrong side of a mount
    in path "/home/support/part_a/mnt"
    "mnt" linked to directory "/a/athene/4.2/xy1e" -- now
    following "/a/athene/4.2/xy1e"
    "xy1e"  mounted on mount point "/a/athene/4.2"
    
    Path "/home/support/part_a/mnt/cmk" has :-
    1 symbolic links
    2 cross mounts 

    Conclusions

    The method of managing filesystem structure presented here can be used to provide referential transparency across a heterogeneous network of UNIX machines. It allows the administrator to make the changes necessary for effective administration, while minimizing both the change that the user has to cope with and the possibility that an application might stop working for reasons associated with these changes.

    The AMD automounter with the automounter maps distributed via NIS provides a centralized mechanism for the management of filesystem change. The physical location of a disk partition is transparent to the user; therefore, the underlying structure of the mounts can be changed so long as the structure of the filesystem appears the same to the user.

    This article has presented an automated approach to the generation of symbolic links and a simple language for the description of the links. The symbolic link generator greatly simplifies the generation of symbolic links on a network of machines.

    Symbolic links can have a significant performance hit; the electronic code distribution includes a short program to survey the filesystem and point out potential problems. In addition, the symbolic link generator attempts to reduce successive levels of symbolic links in order to make a link only to a regular file or directory.

    References

    Pendry, J. 1990. Amd Automounter Manual.

    Sun Microsystems. December 1990. "Networks and File Servers : A Performance Tuning Guide." Technical Report.

    The mailing list amd-workers@gov.lanl.acl appears to be the best method of getting up to date information on the status of amd.

    About the Authors

    Clive King is a member of the Centre for Intelligent Systems within the Department of Computer Science in the University of Wales, Aberystwyth. His duties include teaching UNIX systems programming, supporting research, and administering a network of UNIX workstations. Prior to this, he was a Systems Administrator at the Ordnance Survey, the national Mapping agency in the U.K. His research interests are in the area of distributed systems and concurrent programming.

    Adrian Rixon is a Systems Administrator within the Department of Computer Science in the University of Wales, Aberystwyth. His research interests are in the areas of information discovery and retrieval in Wide Area Networks and network security.


     



  •