Cover V02, I03
Article
Listing 1
Listing 2
Sidebar 1

may93.tar


Using Little Languages

Larry Reznick

After having worked with several flavors of UNIX, I've come to appreciate certain "extras" in each. The SCO cal program, for example, defaults to a display that includes this month, last month, and next. In contrast, the BSD version shows only one month. I like the extra information. Conversely, the SunOS version of the Berkeley distribution includes which, a very useful tool for administrators; SCO does not. In this article, I'll show you how to use the little languages to replace or enhance these two utilities.

A which for SCO

The which program searches the current executive path for a set of program names. If any of the names are found, which gives the full path of those names. When more than one copy of a program is floating around, which can tell you which is actually executing. Sometimes, a program doesn't do what you want simply because you are unintentionally executing the wrong version of the program. I had to have the which program, but SCO did not provide it, so I sat down and wrote my own (Listing 1).

The first line contains just a colon, a widely used convention that tells other shells this is a Bourne script. If the first character of the first line of a shell script is a pound sign ("#"), the shell presumes it is a C-Shell (csh) script. If anything else is there, it is presumed to be a Bourne Shell (sh) script. So, if some other shell is being used when this script is executed, the appropriate shell will be used to execute this script.

The first test is a useful shell idiom that determines whether an argument is present. If $1 (the argument right after the which program's name) is present, it will be expanded within the quotation marks right after the X. Therefore, the expanded result can remain X only if there were no arguments. Since which must have at least one argument to do useful work, this task triggers a usage reminder message.

Note I have used $0 (the name used to invoke the program) in the usage message. With this coding, the usage message will appear correct to the user, even if the script has been linked or copied to a different file name. The usage clause exits with a non-zero error code, the standard convention for an error exit. Coding an appropriate exit value allows which to be used inside of a larger shell script, that might need to know if the which program fails.

The dirs variable is set by shell substitution, one of my favorite shell techniques. Using the grave accent marks (usually called the back-apostrophes), a full shell command can be embedded within another command. The embedded command is executed in a subshell, and, when finished, its output text is placed where the back-apostrophes used to be: substituted. Here shell substitute allows me to use sed to reformat the path string.

In sh, the several paths collected in the PATH variable are separated by colons. I want to use each path as a separate parameter in a for loop, which means that they must be separated by spaces, instead. Simple! Just use sed to substitute a space for every colon. Unfortunately, the substitution isn't quite that simple. By convention, a colon with no name separating it from another colon or from an end of the string stands for the current directory (.). If a lone colon comes at the beginning of the list (e.g., PATH=:somepaths), the . directory is to be searched first. If a lone colon appears at the end of the list (e.g., PATH=somepaths:), the current directory is to be searched last. (This default behavior always seemed to be a meaningless shortcut to me, since it only saves one character. But what the heck, Bourne may have had an excellent reason that I just do not know about.)

Placing the colon last prevents programs in the current directory from being executed in place of some standard utilities. Doing otherwise might give a system cracker undue privilege in execution.

So, whenever a lonely colon is found, I need to replace it with an explicit period to refer to the current directory. By piping the PATH string into sed (invoked with the -e option), I can perform the entire operation in a single line. If the substitution was specified in one expression, I wouldn't need the -e option, but since I've used four expressions, each must be preceded by a -e. The first three of these expressions could appear in any order, but the fourth one must be last. The first expression,

's/^:/\.:/'

says "substitute, find a colon at the beginning of the line, and replace it with a period followed by a colon." This expression processes the leading colons. The second expression,

's/:$/:\./'

processes trailing colons (substitute, find a colon at the end of the line, and put in its place a colon followed by a period). The third expression,

's/::/:\.:/'

processes colons which appear in the middle of a path. This expression, which is somewhat simpler, says, "substitute, find any double colon, and replace it with a colon, then a period, and another colon." The periods must be escaped (have a backslash in front of them) to prevent the period metacharacter from being interpreted by sed. The backslashes are probably not necessary in the replacement side of the substitution, but I decided to play it safe.

The fourth expression

's/:/ /g'

turns every colon into a space (g for globally throughout the line, without which only the first colon would be turned into a space). If this expression does not come last in the list, none of the other searches will work, since the colons they depend upon will have already been replaced. Collectively, this subshell substitution will convert the PATH string into a list suitable to a for loop control, with a period for the current directory and spaces between entries.

This list (the path directories) will control the innermost for loops, while the command-line arguments (a list of files) will control an outer while loop. Thus the program will search the entire path for each file, before moving to the next file. If the path is exhausted without finding a file, then that file's name is appended to a variable, notfound, for later reporting. Just in case this script has inherited another usage of notfound from a parent environment, I initialize notfound before starting the outer loop.

I use the found variable as a flag that signals the outer loop when it needs to update the notfound list. To search the path, the for loop assigns each directory name from the $PATH to the dir variable and uses an if test to see if there is an executable (-x) file with the current argument's name in the current $dir. If so, the script outputs the full pathname, sets the found flag, and quits the loop. (This program does not care how many directories it is in, just which one is first in the path list.)

Once the inner loop is finished, if the found variable is still false (zero), the current program name argument is concatenated to the notfound variable. Finally, the command-line arguments are shifted so that during the next iteration of the outer loop $1 will refer to the next available argument. (The while test will figure out whether there really is one or not.)

Once the outer loop exits, the script reports any names saved in the notfound variable, along with the list of directories checked. Finally, the program exits with a zero code, meaning that there is no error.

This works for me, but needs one enhancement to be reliable in all situations: it should check the csh alias list before it bothers to check the directories in the path. If the target file is in the alias list and csh is the standard shell (check the $SHELL variable), the alias will always be executed before the path is followed.

A Fancy Calendar

When invoked with no arguments, the SVR4 cal command shows the current month. The SCO version shows the previous month, the current month, and the next month, along with the current date and time. The extra information is nice, especially when planning appointments or checking the date of recent events. I thought I'd go SCO one better by writing a cal-style for SRV4 that shows three months and highlights the current date on the middle calendar.

The today program (Listing 2) constructs explicit references for the prior and next months, uses these (in three separate cal commands) to produce the individual months, and then assembles the final display using paste.

The first line shows the syntax for explicitly naming the target shell. When the first line begins with a pound sign, followed by an exclamation mark, the balance of the line will be interpreted as the name of the shell (including any special options if needed) which should be used to execute the script. Note that the pound sign must be the first character in the first line.

After the comments the program invokes the date command three times. The first invocation gives the date and time when the today program begins. The second and third invocations use format strings to retrieve special forms of the date. If the argument to date begins with a plus sign (+), the rest of the argument is taken to be a set of printf-like format descriptors that control how date formats its output. Of date's many options, the today program uses only those that isolate the month, day, and year.

The program loads the NOW variable with the number of the current month. Using NOW as an anchor, the program then computes a value for the other two months. The LAST variable is set to the previous month and NEXT is set to the following month. The expr(1) program performs the addition in a subshell substitution.

Next, the program loads the YEAR variable and copies its value into the NEXTYEAR and LASTYEAR variables. The program tests LAST and NEXT; if they are outside the range of 1-12, then LAST or NEXT is "wrapped" and the appropriate year variable is adjusted.

Now the program tackles the task of combining the output from three separate executions of the cal program into a single horizontal display. This is a job for the paste program. But first, the today program must make some adjustments to the cal output.

In keeping with the usual UNIX philosophy, the cal program delivers minimal output. So, there are no trailing spaces at the end of each calendar line. This characteristic makes it difficult to paste the three months together with uniform spacing -- especially on the last lines of each month, where the line ends abruptly after the last day number.

I solved this problem by passing the cal output through awk, so that I could use its C-like printf() to add some padding. The %-21s notation tells printf() to left-justify (the '-') a string (the 's') in a 21-character field padded with spaces. Awk uses $0 to refer to the entire input line, which would be one title line or week line. I save the reformatted $LAST and $NEXT calendars in temporary files named after their months followed by a dot and the process ID. These names should be unique enough that they will not conflict with other files.

Next, the program uses sed to highlight the current date. This application also needs multiple expressions, since the highlight involves three different substitutions. The first two substitutions add a leading and trailing space on every line. If the current day's number falls either at the beginning or the end of a line, the program will replace one of these spaces with the highlight symbols. I chose to use less-than and greater-than symbols, so " 12 " will be highlighted as "<12>". Since cal doesn't output leading spaces either, adding the highlight symbols without this padding step would shove the corresponding line in the next month over, making the whole thing look ugly.

With the leading and trailing spaces in position, the current day number can be surrounded with the highlight symbols. The sed substitution command is:

"s/(`date '+%e'`\) /\<\1\>/"

which says, "substitute, find a string beginning with a space, then a subexpression composed of the current day number, then another space, and replace it with the first subexpression surrounded by less-than and greater-than symbols" (the '<' and '>' symbols must be escaped because they are metacharacters used in regular expressions). Notice that this little trick uses all three kinds of quotes: the quotation marks around the entire substitution to keep the shell from interpreting the metacharacters (but allow the back-apostrophes to be used within) and the regular apostrophes to avoid the use of escaped quotes within the back-apostrophe subshell substitution. After combining and outputting all three pieces, the today script removes files and terminates.

Enhancements

The today script always centers on the current date. One could easily modify it to generate a similar display for any specified date. By using tput(1) to get the terminal's high-intensity, reverse-video, or blinking attribute, today could avoid those kludgy less-than/greater-than symbols, when used from terminals with alternate display modes.

About the Author

Larry Reznick has been programming professionally since 1978. He is currently working on systems programming in UNIX and DOS. He teaches C language courses at American River College in Sacramento. He can be reached via email at: rezbook!reznick@csusac.ecs.csus.edu.