Article

The Missing Symlink

Randal L. Schwartz

Symbolic links were not present in the first version of UNIX that I used. That would be UNIX V6, back in 1977, when the UNIX kernel size was under 32K. It's hard to imagine anything under 32K being associated with UNIX these days.

But somewhere in the bowels of the University of California at Berkeley, in the early 80s, the boys working on BSD concocted a scheme to rectify two of the biggest problems with hard links: they couldn't be made to a directory, and they didn't want to point to another mounted filesystem. Their solution was that now common feature, a symbolic link.

A symbolic link is essentially a text string that sits in place of a file. When the symbolic link's filename is accessed, the UNIX kernel replaces the filename with its text value instead, like a macro expansion. This all happens transparently to the executing program (unlike some other popular operating systems).

From the shell, symbolic links are easy enough to create:

ln -s /usr/lib/perl5 ./Lib

which makes a reference to Lib in the current directory hop over to /usr/lib/perl5/. From Perl, this same step is:

symlink("/usr/lib/perl5", "./Lib") or die "$!";

We can see this is so with:

ls -l

which will show something like:

..... Lib -> /usr/lib/perl5

indicating this redirection is going on. That same fact is apparent to Perl, like so:

my $where = readlink("Lib");
print "Lib => $where\n";

But what if /usr/lib itself is also a symbolic link, say to /lib? Well, the system nicely picks that up when it's looking down the steps from /usr to /usr/lib, redirects that to /lib, and continues from there to look for perl5.

Thus, following a symlink may involve multiple expansions. There's a limit to the number of expansions in a path to prevent runaway loops, but generally it's enough that you won't worry about it.

What's the easiest way to really know where the symlink ends up then? Well, you could keep typing a lot of ls -l invocations, and take careful notes, or just write a Perl program to do the expansion for you.

While we're at it, let's also make this work recursively from a starting directory in a filetree, dumping out all the symlinks and their ultimate expansions in all directories contained within. Cool.

So, here's a program that does just that, presented a few lines at a time.

#!/usr/bin/perl -w
use strict;
$|++;

These first three lines tell us where to find Perl, and enable warnings and the usually good compiler restrictions. We'll also disable buffering on STDOUT, so I can see how far the program has gotten during a long run.

use File::Find;
use Cwd;

Next, we'll pull in two modules from the standard Perl distribution library. File::Find helps us recurse through a directory hierarchy without thinking too hard about it, and Cwd gets the current working directory, usually without forking off a child process.

my $dir = cwd;

Now, we'll get the current directory via cwd (imported from Cwd). We'll need this to properly expand relative names into absolute names.

find sub {
  ##### contents here presented below
}, @ARGV;

Next, the outer part of the body of the program. We'll call find (imported from the File::Find module), passing it an anonymous subroutine reference, and the command-line argument array @ARGV. The subroutine (whose contents are defined below) will be called for each file or directory in all directories and subdirectories starting at the top-level directories named in @ARGV.

Now for the guts of the subroutine. In the real program, these are really located where ##### is marked above.

return unless -l;

When this subroutine is called, $_ is set to the name of the file or directory of interest, and the current directory is set to the directory that contains this item. Here, we'll end up returning if the item is not a symbolic link.

The next two lines set up the core of the routine. I'm gonna have an @left and an @right variable. Think of @left as "where in the filetree am I so far?" and @right as "where else am I being told to go?". The basic task is to take one element at a time from the front of @right, and try to glue it onto the end of @left, until we have no more @right to go. If at any step, the path of @left is a symlink, however, we'll have to expand it and start again. Also, if the element being examined from @right is a dot or dot-dot, we'll need to back up on @left instead.

my @right = split /\//, $File::Find::name;

The variable $File::Find::name has the full pathname starting from the kind of name we gave on the command line. If that was a relative name, this will also be a relative name to the original working directory (now saved in $dir). Here, I'm splitting the name apart into individual elements.

my @left = do {
  @right && ($right[0] eq "") ?
    shift @right :            # quick way
      split /\//, $dir;
};    # first element always null

This is a bit more complicated, so I'll take it slowly. We're setting up @left to be the value of this expression coming from a do block. If the first element of @right is empty, then the original string began with a slash, and we need to be relative to the root directory. That's handled by moving that empty element from the beginning of @right to become the only element of @left. Otherwise, we had a relative name, and we'll preload @left with a split-apart version of the initial working directory.

while (@right) {

Now, as long as we have items to keep walking, we'll do this.

my $item = shift @right;
next if $item eq "." or $item eq "";

This grabs the next step, and discards it if it's just an empty string or a single dot, meaning that we would have stayed at the current directory.

if ($item eq "..") {
  pop @left if @left > 1;


  next;
  }

And if it's dot-dot, we'll have to pop up a level on our current position (unless it would have us back up over the top).

my $link = readlink (join "/", @left, $item);

Now, if the path of @left, together with the next step, forms a symbolic link, the value of $link will be defined to be what we need to replace $item with. Otherwise, we can just slide along.

if (defined $link) {
  my @parts = split /\//, $link;
  if (@parts && ($parts[0] eq "")) { # absolute
    @left = shift @parts;   # quick way
  }
  unshift @right, @parts;
  next;

If it's a symbolic link, we'll split it apart. If it's absolute, @left gets reset to the top. Otherwise, @left stays as is. We'll also push whatever we got in front of the remainder of @right, as it will influence the interpretation of that remaining path.

} else {
  push @left, $item;
  next;
}

If it wasn't a symbolic link at this step, it's simple; we just move along to that point in @left.

}
print "$File::Find::name is ", join("/", @left), "\n";

When the loop is over, we'll dump out the resulting path of @left.

And there you have it. It's a bit tricky, since the macro expansion of a symbolic link is somewhat recursive, but Perl's data structures and full access to the right system calls give us a straightforward way of interpreting symbolic links.

Now you'll never have to wonder where those links point again. Until next time, enjoy!

About the Author

Randal L. Schwartz is an eclectic tradesman and entrepreneur, making his living through software design, technical writing and training, systems administration, security consultation, and video production. He is known internationally for his prolific, humorous, and occasionally incorrect spatterings on Usenet - especially his "Just Another Perl Hacker" signoffs in comp.lang.perl. Randal honed his many crafts through seven years of employment at Tectronix, ServioLogic, and Sequent. Since 1985, he has owned and operated Stonehenge Consulting Services in his home town of Portland, Oregon.