Cover V08, I03
Article

mar99.tar


Perls of Wisdom

Christopher Bush

A year or so ago, my son, then age five, asked me, "How did they make tools before there were tools?" Or, to put it another way, "Which came first, the chicken or the egg?" After thinking about my son's question, I realized I didn't have a very good answer. Then I thought about the apes in the opening scenes of the movie "2001: A Space Odyssey". They had discovered how to use the bones of a dead animal as a tool - a weapon really. Too graphic an example for a five-year-old, I thought. That scene does, however, illustrate how humankind has continually found ways to use what we have to get what we need. This same approach applies to managing the UNIX operating system.

I recall the first time I wanted to build the GNU C compiler on my new Sun workstation running Solaris 2.1. Sun Microsystems had seen fit to stop bundling the C compiler with the operating system, since there was no longer a need to re-compile the kernel. So there I sat with the source distribution for the latest GNU C compiler, written entirely in C. I was all dressed up with someplace to go, but no way to get there. Luckily, I had tools at hand. I used my favorite tool, the Internet, to download a pre-compiled older version of gcc for Solaris, which I then used to compile this new version. Problem solved, question left unanswered. Where did the older pre-compiled version come from? I didn't know, and I didn't care; any more than the ape in the movie cared what the dead animal was, or what killed it. What I did know was that I needed to use the tools at my disposal to create what I needed, and I was comfortable doing so.

It's this very knack for creating and using tools that is part of the underlying philosophy of UNIX (although I'm not implying that the typical UNIX systems administrator in any way resembles an ape). The creators of UNIX made no myopic assumptions about what kinds of things people would need to do. As a result, a lot of small, efficient utility programs were added to UNIX allowing it to become a flexible, if sometimes arcane, operating system.

UNIX is more than an operating system, though. It is a fully loaded toolbox. Instead of hammers and screwdrivers (or the femur of a dead tapir), you get grep, awk, sed, lex, yacc, and more. These tools are general purpose enough to allow a savvy systems administrator to use them for accomplishing a variety of different things, most of which couldn't have been imagined at the time UNIX was designed. To be a successful systems administrator, you'll have to continually learn to wield these tools in ways that you can't even imagine today. UNIX is an operating system with a personality, which lends itself well to those people with an inherent ability to use and combine tools to accomplish their tasks.

In Every Oyster

It is precisely this UNIX toolbox that gives its systems administrators a distinct advantage over those administering other operating systems. I'm not suggesting that you beat your Windows NT counterparts senseless with the leg bone of some poor dead creature, but in a figurative way, you can "beat them" nonetheless. Consider this problem. I want to know how many partitions/drives on my Windows NT server are over 90% capacity. I can fire up the Windows Explorer (file manager), point here, click there, drag, select, and pretty soon I have a nifty little window with a pie chart showing utilization of one drive. I make a note of the capacity and utilization. Then, I can do it all again for each drive. Wash, rinse, and repeat. On my UNIX box, I can perform this check by whipping off one simple command line:

% df -k | perl -ane '($F[4] > 90) && print)'
"Whoa, there!", you say. "Is that Perl he's using? What about awk and grep?" Yes, I used Perl. I long ago relegated awk, grep, lex, and sed to the bottom drawer of my systems administrator's toolbox. They now sit there collecting dust right beside that tired old Bourne shell.

When I first discovered Perl, I used it for scripting of frequently performed tasks that begged for automation. Perl had the power and (mostly) familiar syntax of C, the ease of use of Bourne shell, powerful string handling, and the same regular expression capabilities familiar to anyone who has ever read the man page for sed(1). Perl quickly replaced Bourne shell as my language of choice for automating everything from backups to printer and DNS management. Then the World Wide Web came along, and Perl increased in popularity as the language of choice for CGI application programming. Before I knew it, I was so familiar with Perl syntax and semantics that I rarely needed to pick up my favorite Perl book (see References).

In the meantime, every time I needed to do something that traditionally required some combination of grep, awk, or sed, I'd find myself reading the man pages, or worse, picking up my awk or shell book to look up syntax. Then I thought, what if Perl could be used in a similar fashion - on the command line? Perhaps there was opportunity to save myself time and get fewer paper cuts flipping through my awk text.

Here's what I would have done with awk to accomplish the same thing as the filesystem example above:

% df -k | awk '{if ($4 > 90) then print $0}'
awk: syntax error near line 1
awk: illegal statement near line 1
After this, I would have retrieved my awk text, figured out the syntax, and come up with:

% df -k | awk '{if ($4 > 90) print}'

Pretty similar to the command line using Perl. awk, like Perl, doesn't use a then keyword in if-then constructs. I should have known that. I wonder how long it would take to get all that information on a Windows NT system? However, I admit I had to learn a few things about Perl to get this far. At first look, I would've written a small Perl program like this:

#! /usr/local/bin/perl
die "Sorry!" unless open(DFOUT,"/usr/bin/df -k|");
while (<DFOUT>) {
    @fsys = split;
    if ($fsys[4] > 90) {
        print $_;
    }
}

I could then tuck this program away in /usr/local/bin and forget what I named it, making it almost impossible to use again. Luckily, Larry Wall and friends have seen fit to build into the Perl interpreter a number of extremely useful command-line switches. These switches, little Perls of wisdom if you will, are precisely what are needed to turn Perl from powerful programming language into powerful and easy to use command-line tool. I will present six such switches in this article and provide some simple but useful one-liners that can be used to perform some day-to-day systems administration and Web site administration tasks. More important though, I hope you'll learn enough about using Perl as a general purpose command-line tool to apply it usefully to your own administration tasks.

Perl 1

There is one fundamental switch necessary to use Perl from the command line, and that is the -e switch. The -e switch tells the Perl interpreter to execute the following argument as its program, rather than looking through the argument list to find a script filename to execute. The argument to -e should be a valid Perl program, enclosed in single or double quotes. The above short Perl program could have been executed on the command line like this:

% df -k | perl -e 'while (<>) {@fsys = split; if ($fsys[4] > 90) {print}}'
Perl, like awk, grep, and sed, can accept data from standard input, or by including filename(s) on the command line. The above command accomplishes the same thing as the program shown previously, but it still seems like a lot to type for one command. It would be easier to type the previous program into an executable script file after all. That doesn't mean the -e switch isn't useful, though. A number of other switches can also make life on the command line easier. One such switch is -n.

Perl 2

The -n switch automatically encloses your script, in our case the argument to -e, in a loop like this:

while (<>) {
    # your code here
}

This will cause your code to be executed for every record in the standard input, which may come from multiple files, the terminal, or piped from another command. For the above example, we can now write:

% df -k | perl -ne '@fsys = split; if ($fsys[4] > 90) {print}'

That's a little shorter, but may be more work than is necessary.

Before we move on to another switch that will help out, let's take advantage of the way Perl evaluates logical expressions to shorten the code above. Previously in this article, you saw the following command line:

%df -k | perl -ane '($F[4] > 90) && print)'
Besides the -a switch, which we'll get to shortly, you'll see that there is no if conditional. Instead, I have used a logical AND, denoted by the &&. Here, the left side of the logical AND expression is evaluated, and the right side is only evaluated if the left side evaluates to true. This is perfectly okay, since the entire expression cannot possibly evaluate to true if either one of the evaluated expressions (in this case, the one on the left) is not true. By evaluating the left side first, Perl avoids wasting time evaluating the right-hand expression if the left isn't true. The end result is a short-form if-then construct. This doesn't save much typing with this particular example, but it can be a handy thing to know. Combining this revised conditional construct with the -a switch, will get us to the final form of this example command line as first seen in the introduction.

Perl 3

The -a switch turns on autosplit mode, automatically splitting the input records into a list. The split function, by default, splits a string on whitespace, just like awk. Perl's autosplit mode splits the string into the array @F. So, now we can rewrite the previous command:

% df -k | perl -ane '($F[4] > 90) && print'

I think I've seen this before somewhere. Let's look at a few more examples combining what we've learned so far. Let's say we want a sum total of Kbytes used on all filesystems. We might use something like:

% df -k | perl -ane '$tot+=$F[2]; END {print $tot,"\n";}'

The END will look familiar to avid awk users. It is included in Perl specifically to provide the same functionality as it does in awk - to allow for statements to be executed after the implicit loop created by -n. Similarly, there is a BEGIN keyword for blocks that must be executed before the loop. If you're a stickler for initializing variables, you would write:

% df -k | perl -ane 'BEGIN {$tot=0} $tot+=$F[2]; END {print $tot, \n"}'

This is really starting to look like an awk program, isn't it? To further the similarity to awk, Perl includes the familiar -F switch.

Perl 4

Just as with awk, the -F switch allows you to change the character used to divide up input records into separate fields (or list elements as they are called in Perl). For example, let's look at all of the user names in the password file:

% perl -F: -ane 'print $F[4], "\n"' /etc/passwd

The : character is the field separator for elements in the password file. Note that rather than the implicit while loop processing records from standard input as the result of a pipe, it reads from /etc/passwd. Perl will treat everything after the last switch and its accompanying argument as a filename or filenames to be read as input. Multiple files can be given, either explicitly or using shell metacharacters (wildcards).

So far, we haven't seen much that you can't do with awk, using a few less command-line switches. But consider the following example, in which Perl comes in handy for a quick and dirty "killall" command:

% ps -wax | perl -ane '/httpd/ && kill 9,$F[0]'

There's that && logical expression again, acting like an if-then. If you just typed the above command, and you had proper privileges, you just unceremoniously shut down your Web server. Oops! No new switches on this command line, but because of Perl's built-in pattern matching capabilities and robust function set, which includes inter-process communication capabilities such as signaling, by using the kill function, we've done something that would otherwise require a shell looping construct, a grep command, and probably awk.

Another handy thing to know when using Perl this way is that like awk, when Perl is processing a file or standard input (as with the while (<>) { } loop construct or the -n switch) the file or files will be broken into records, with the default record separator being the end-of-line ("\n"). Also like awk, Perl allows this default input record separator to be changed. The $\ scalar variable in Perl, if explicitly set, changes the input record separator. For example, if you have a file broken up into paragraphs, you can process each paragraph as a separate input record, rather than each line. The assumption here is that paragraphs are sections of the file separated by two or more consecutive newlines. In Perl, you would set:

$/="";

to do this. This is different from:

$="\n\n";

which specifically looks for a blank line, and treats the next character as the first character of the next input record. Using "\n\n" would not work properly if you had a file with three consecutive newlines. For example, I maintain a printcap file with a blank line between each printer definition. If I want to search /etc/printcap for every printer that spools remotely to the print server named megadon, I could do this:

% perl -ne 'BEGIN {$/=""} /rm=megadon/ && print, "\n";' /etc/printcap

So far, we've looked at examples of reporting on contents of files or post-processing the output from commands such as df. All these things could as well be done with awk, or awk combined with grep and the shell in some fashion. That's okay. We're tool jockeys; we can handle that. But what if we want to make changes to files instead of simply reporting on their contents? Enter the Perl interpreter's -p switch.

Perl 5

The -p switch works much like -n, except the implicit loop looks like:

        while (<>) {
            # your code here
        } continue {
            print;
        }

Here, all input records are passed to standard output after your code operates on them. With -n, the only things that are output are what you explicitly cause to be from within your code. Consider this example:

% perl -pe 's/rm=megadon/rm=syrinx/;' /etc/printcap > /etc/printcap.new

We've just created a new printcap file, /etc/printcap.new, that if installed will cause all print jobs that previously spooled remotely to megadon to spool to the server named syrinx. We've just used Perl where we might previously have used sed. Let's say you're confident and brave (admirable but often dangerous traits for a systems administrator), and you want to skip the intermediate step of creating /etc/printcap.new and then copying to /etc/printcap. Instead of using the -p switch, you can use -i.

Perl 6

The -i switch considerably expands the amount of implicit code used. In short, -i specifies that the files processed by <> are edited in place. Sound dangerous? It can be, but you can pass an optional argument to -i, which specifies an extension to be used to create a backup of the original file or files. We can now do this:

% perl -p -i.old -e 's/rm=megadon/rm=syrinx/;' /etc/printcap
Now restart the line printer daemon (you did shut it off first, right?). Your printers that formerly spooled to megadon will spool to syrinx. If you need to move these printers back to megadon, you have a file called /etc/printcap.old, which contains the version of the printcap file prior to executing the above command.

Another place I've found the -i switch useful is in administering a Web site. Suppose I have a bunch of HTML files that refer to an image, but the author of the pages did not include the image size attributes (width and height) in the HTML code. Including these attributes can enhance the viewer's browsing experience by allowing the browser to display the text of the document more quickly, because it knows how much space on the page to allocate for the images before they finish loading. I can edit all of the files in my current directory that refer to a particular image with a command like this:

% perl -pi.bak -e 's/src="ceopic.jpg"/src="ceopic.jpg" width="100"
height="125"/g;' *.html

You can also take advantage of another feature of Perl that is very similar to one in sed, in which all or part of the matched pattern can be plugged into the substitution string. For example, suppose I have a bunch of HTML files with some navigation buttons, named button1.gif, button2.gif, button3.gif, etc. Not very creative naming, but it serves to illustrate the point. I want to add the width and height attributes to all references to these images in each file in the current directory. I can do this with a command like this:

% perl -pi.bak -e 's/(src="button.*\.gif")/$1 width="40" height="20"/g;'
*.html

By grouping part of the pattern we are attempting to match (which is a simple regular expression) in parentheses, we can then use the matched pattern in the substitution string, in the form of the $1 scalar. Multiple groupings in the pattern are referred to in the replacement string as $1, $2, $3, and so on. Sure beats editing each file individually and remembering how to do a similar substitution within the vi editor. I've been told that you can use ed to do similar in-place editing of multiple files, but I've yet to master that one.

I've found myself using the -i switch for more and more tasks, both in systems administration and Web site administration. It is a very powerful addition to my Perl knowledge. If you're interested in exactly what the implicit code placed by the -i switch is, see page 332 of Programming Perl, 2nd Edition, by Larry Wall, Tom Christiansen, and Randal Schwartz. Every Perl programmer ought to have this book.

Conclusion

I previously stated that UNIX is an operating system designed by and for people with an inherent knack for combining and using tools. Perl, which has originally developed for use on UNIX systems, follows in a similar vein. UNIX is an operating system with personality, and Perl is a programming language with a similar personality. The same traits that make people good UNIX systems administrators seem to make them very comfortable with Perl.

In this article, I've shown you how to adapt Perl, a full-featured programming language used for anything from systems administration scripts to Web applications, and use it as a powerful and flexible command-line utility. It's this knack for tools that makes UNIX systems administrators what they are, and quite possibly is what draws them to UNIX versus the more sexy, trendy operating systems like Windows NT. But are we adept at using tools because we use UNIX, or do we use UNIX because we are so skilled with tools? It's the chicken and the egg all over again.

References and Recommended Reading

Tomas Scoville, "The Elements of Style: UNIX as Literature", http://www.performancecomputing.com/features/9809of1.shtml, 1998

Larry Wall, Tom Christiansen, Randal L. Schwartz, Programming Perl, 2nd Edition. O'Reilly & Associates, Inc. 1996.

Stanley Kubrick, "2001-A Space Odyssey", MGM/UA Studios, 1968.

About the Author

Chris Bush is a Programmer Analyst for a major mid-west bank in Cleveland, Ohio. He does Web site management, Web development, Internet/Intranet architecture, UNIX systems administration, and dabbles with Linux at home in his spare time. He can be reached at chris.bush@stratos.net.