Using Little Languages
Larry Reznick
After having worked with several flavors of UNIX, I've
come to appreciate
certain "extras" in each. The SCO cal program,
for
example, defaults to a display that includes this month,
last month,
and next. In contrast, the BSD version shows only one
month. I like
the extra information. Conversely, the SunOS version
of the Berkeley
distribution includes which, a very useful tool for
administrators;
SCO does not. In this article, I'll show you how to
use the little
languages to replace or enhance these two utilities.
A which for SCO
The which program searches the current executive path
for a
set of program names. If any of the names are found,
which
gives the full path of those names. When more than one
copy of a program
is floating around, which can tell you which is actually
executing.
Sometimes, a program doesn't do what you want simply
because you are
unintentionally executing the wrong version of the program.
I had
to have the which program, but SCO did not provide it,
so I
sat down and wrote my own (Listing 1).
The first line contains just a colon, a widely used
convention that
tells other shells this is a Bourne script. If the first
character
of the first line of a shell script is a pound sign
("#"),
the shell presumes it is a C-Shell (csh) script. If
anything
else is there, it is presumed to be a Bourne Shell (sh)
script.
So, if some other shell is being used when this script
is executed,
the appropriate shell will be used to execute this script.
The first test is a useful shell idiom that determines
whether an
argument is present. If $1 (the argument right after
the which
program's name) is present, it will be expanded within
the quotation
marks right after the X. Therefore, the expanded result
can
remain X only if there were no arguments. Since which
must have at least one argument to do useful work, this
task triggers
a usage reminder message.
Note I have used $0 (the name used to invoke the program)
in
the usage message. With this coding, the usage message
will appear
correct to the user, even if the script has been linked
or copied
to a different file name. The usage clause exits with
a non-zero error
code, the standard convention for an error exit. Coding
an appropriate
exit value allows which to be used inside of a larger
shell
script, that might need to know if the which program
fails.
The dirs variable is set by shell substitution, one
of my favorite
shell techniques. Using the grave accent marks (usually
called the
back-apostrophes), a full shell command can be embedded
within another
command. The embedded command is executed in a subshell,
and, when
finished, its output text is placed where the back-apostrophes
used
to be: substituted. Here shell substitute allows me
to use sed
to reformat the path string.
In sh, the several paths collected in the PATH variable
are separated by colons. I want to use each path as
a separate parameter
in a for loop, which means that they must be separated
by spaces,
instead. Simple! Just use sed to substitute a space
for every
colon. Unfortunately, the substitution isn't quite that
simple. By
convention, a colon with no name separating it from
another colon
or from an end of the string stands for the current
directory (.).
If a lone colon comes at the beginning of the list (e.g.,
PATH=:somepaths),
the . directory is to be searched first. If a lone colon
appears
at the end of the list (e.g., PATH=somepaths:), the
current
directory is to be searched last. (This default behavior
always seemed
to be a meaningless shortcut to me, since it only saves
one character.
But what the heck, Bourne may have had an excellent
reason that I
just do not know about.)
Placing the colon last prevents programs in the current
directory
from being executed in place of some standard utilities.
Doing otherwise
might give a system cracker undue privilege in execution.
So, whenever a lonely colon is found, I need to replace
it with an
explicit period to refer to the current directory. By
piping the PATH
string into sed (invoked with the -e option), I
can perform the entire operation in a single line. If
the substitution
was specified in one expression, I wouldn't need the
-e option,
but since I've used four expressions, each must be preceded
by a -e.
The first three of these expressions could appear in
any order, but
the fourth one must be last. The first expression,
's/^:/\.:/'
says "substitute, find a colon at the beginning
of
the line, and replace it with a period followed by a
colon." This
expression processes the leading colons. The second
expression,
's/:$/:\./'
processes trailing colons (substitute, find a colon
at
the end of the line, and put in its place a colon followed
by a period).
The third expression,
's/::/:\.:/'
processes colons which appear in the middle of a path.
This expression, which is somewhat simpler, says, "substitute,
find any double colon, and replace it with a colon,
then a period,
and another colon." The periods must be escaped
(have a backslash
in front of them) to prevent the period metacharacter
from being interpreted
by sed. The backslashes are probably not necessary in
the replacement
side of the substitution, but I decided to play it safe.
The fourth expression
's/:/ /g'
turns every colon into a space (g for globally
throughout the line, without which only the first colon
would be turned
into a space). If this expression does not come last
in the list,
none of the other searches will work, since the colons
they depend
upon will have already been replaced. Collectively,
this subshell
substitution will convert the PATH string into a list
suitable
to a for loop control, with a period for the current
directory
and spaces between entries.
This list (the path directories) will control the innermost
for
loops, while the command-line arguments (a list of files)
will control
an outer while loop. Thus the program will search the
entire
path for each file, before moving to the next file.
If the path is
exhausted without finding a file, then that file's name
is appended
to a variable, notfound, for later reporting. Just in
case
this script has inherited another usage of notfound
from a
parent environment, I initialize notfound before starting
the
outer loop.
I use the found variable as a flag that signals the
outer loop
when it needs to update the notfound list. To search
the path,
the for loop assigns each directory name from the $PATH
to the dir variable and uses an if test to see if there
is an executable (-x) file with the current argument's
name
in the current $dir. If so, the script outputs the full
pathname,
sets the found flag, and quits the loop. (This program
does not care
how many directories it is in, just which one is first
in the path
list.)
Once the inner loop is finished, if the found variable
is still
false (zero), the current program name argument is concatenated
to
the notfound variable. Finally, the command-line arguments
are shifted so that during the next iteration of the
outer loop $1
will refer to the next available argument. (The while
test
will figure out whether there really is one or not.)
Once the outer loop exits, the script reports any names
saved in the
notfound variable, along with the list of directories
checked.
Finally, the program exits with a zero code, meaning
that there is
no error.
This works for me, but needs one enhancement to be reliable
in all
situations: it should check the csh alias list before
it bothers
to check the directories in the path. If the target
file is in the
alias list and csh is the standard shell (check the
$SHELL
variable), the alias will always be executed before
the path is followed.
A Fancy Calendar
When invoked with no arguments, the SVR4 cal command
shows
the current month. The SCO version shows the previous
month, the current
month, and the next month, along with the current date
and time. The
extra information is nice, especially when planning
appointments or
checking the date of recent events. I thought I'd go
SCO one better
by writing a cal-style for SRV4 that shows three months
and
highlights the current date on the middle calendar.
The today program (Listing 2) constructs explicit references
for the prior and next months, uses these (in three
separate cal
commands) to produce the individual months, and then
assembles
the final display using paste.
The first line shows the syntax for explicitly naming
the target shell.
When the first line begins with a pound sign, followed
by an exclamation
mark, the balance of the line will be interpreted as
the name of the
shell (including any special options if needed) which
should be used
to execute the script. Note that the pound sign must
be the first
character in the first line.
After the comments the program invokes the date command
three
times. The first invocation gives the date and time
when the today
program begins. The second and third invocations use
format strings
to retrieve special forms of the date. If the argument
to date
begins with a plus sign (+), the rest of the argument
is taken
to be a set of printf-like format descriptors that control
how date formats its output. Of date's many options,
the today program uses only those that isolate the month,
day,
and year.
The program loads the NOW variable with the number of
the current
month. Using NOW as an anchor, the program then computes
a
value for the other two months. The LAST variable is
set to
the previous month and NEXT is set to the following
month.
The expr(1) program performs the addition in a subshell
substitution.
Next, the program loads the YEAR variable and copies
its value
into the NEXTYEAR and LASTYEAR variables. The program
tests LAST and NEXT; if they are outside the range of
1-12, then LAST or NEXT is "wrapped" and the
appropriate year variable is adjusted.
Now the program tackles the task of combining the output
from three
separate executions of the cal program into a single
horizontal
display. This is a job for the paste program. But first,
the
today program must make some adjustments to the cal
output.
In keeping with the usual UNIX philosophy, the cal program
delivers minimal output. So, there are no trailing spaces
at the end
of each calendar line. This characteristic makes it
difficult to paste
the three months together with uniform spacing -- especially
on
the last lines of each month, where the line ends abruptly
after the
last day number.
I solved this problem by passing the cal output through
awk,
so that I could use its C-like printf() to add some
padding.
The %-21s notation tells printf() to left-justify (the
'-') a string (the 's') in a 21-character field padded
with spaces. Awk uses $0 to refer to the entire input
line, which would be one title line or week line. I
save the reformatted
$LAST and $NEXT calendars in temporary files named after
their months followed by a dot and the process ID. These
names should
be unique enough that they will not conflict with other
files.
Next, the program uses sed to highlight the current
date. This
application also needs multiple expressions, since the
highlight involves
three different substitutions. The first two substitutions
add a leading
and trailing space on every line. If the current day's
number falls
either at the beginning or the end of a line, the program
will replace
one of these spaces with the highlight symbols. I chose
to use less-than
and greater-than symbols, so " 12 " will be
highlighted as
"<12>". Since cal doesn't output leading
spaces either,
adding the highlight symbols without this padding step
would shove
the corresponding line in the next month over, making
the whole thing
look ugly.
With the leading and trailing spaces in position, the
current day
number can be surrounded with the highlight symbols.
The sed
substitution command is:
"s/(`date '+%e'`\) /\<\1\>/"
which says, "substitute, find a string beginning
with a space, then a subexpression composed of the current
day number,
then another space, and replace it with the first subexpression
surrounded
by less-than and greater-than symbols" (the '<'
and '>' symbols
must be escaped because they are metacharacters used
in regular expressions).
Notice that this little trick uses all three kinds of
quotes: the
quotation marks around the entire substitution to keep
the shell from
interpreting the metacharacters (but allow the back-apostrophes
to
be used within) and the regular apostrophes to avoid
the use of escaped
quotes within the back-apostrophe subshell substitution.
After combining
and outputting all three pieces, the today script removes
files
and terminates.
Enhancements
The today script always centers on the current date.
One could
easily modify it to generate a similar display for any
specified date.
By using tput(1) to get the terminal's high-intensity,
reverse-video,
or blinking attribute, today could avoid those kludgy
less-than/greater-than
symbols, when used from terminals with alternate display
modes.
About the Author
Larry Reznick has been programming professionally since
1978.
He is currently working on systems programming in UNIX
and DOS. He
teaches C language courses at American River College
in Sacramento.
He can be reached via email at:
rezbook!reznick@csusac.ecs.csus.edu.
|