A Community-Style Overnight Job Spooler
For a small business running a single multi-user UNIX
typically fall into one of two categories: real-time,
programs or batch style/background jobs. Interactive
as the system shell, editors/word processors, spreadsheets,
entry systems all vie concurrently for slices of the
CPU pie. Such
programs spend most of their real time blocked waiting
for user input,
so they tend not to have much impact on system performance
as there is enough main memory to keep the jobs from
out to disk).
Batch-style jobs such as reports, backup scripts, or
any CPU- or disk-intensive
processes, on the other hand, have a relatively large
impact on system
performance. Such jobs demand as much of the available
as they can possibly get. It doesn't take many such
CPU- or disk-intensive
background jobs running simultaneously to slow user
time down to a crawl.
In many cases, those "expensive" batch-style
jobs would be
less of a pain-in-the-CPU if they could be scheduled
so as not to
compete head-to-head with the interactive processes
for system resources.
In a business environment, the natural solution would
be to run those
jobs overnight whenever feasible, and reserve the business
interactive processes and high-priority batch jobs only.
Users should routinely be given the option of running
jobs overnight. For jobs that must run immediately,
should also be an option. However, with a little bit
(and after enough instances of the molasses-syndrome
due to an overloaded
system), users will understand that overnight queueing
works out best
Pros and Crons
The basic UNIX System V configuration includes only
set of job scheduling tools. The primary facilities,
at, allow the scheduling of jobs for execution at particular
dates and times, but have no provision for prioritizing
those jobs in order to maximize system performance.
Three users might,
unbeknownst to each other, all schedulejobs for execution
P.M. cron will dutifully start them all up at 8:30 P.M.,
resulting in some serious context-switching overhead
while the jobs
vie for system resources.
Now consider the case of automated daily backups. You
could just set
up the cron table to run the backup software every morning
at 5:00 A.M., but what happens if those three long batch
are still slugging it out at 5:00 A.M., and changing
data in the process? cron doesn't care, it just runs
if the backup utility cannot properly coordinate file-locking
in the course of a backup, the result may be lost data.
A better solution would be for all jobs scheduled for
to be registered with a single overseeing system, and
for that system
to be responsible for running the jobs in an orderly,
manner. The simplest way to implement this "ordering"
ensure that all jobs are scheduled sequentially, such
that each job
is run to completion with as little competition from
other jobs as
possible -- especially other resource-intensive jobs.
With the addition of a prioritizing scheme, critical
issues can also be properly managed. Then, for example,
backup script can be configured at the lowest possible
that it runs only after all other jobs have been completed.
In this article, I describe a set of Bourne Shell scripts
work together to provide a sequential overnight job-spooling
The package is geared towards a "community-style"
environment -- that is, an environment that allows any
invoke a particular overnight job and that prints out
or places the
output resulting from the job in a public destination
so that any other user may choose to view or print out
as required by the specific application.
Any stdout/stderr output not explicitly directed into
file by an overnight job will be captured into a default
generally accessible only by a system administrator.
may be used as a simple status- and error-logging mechanism.
The onitesetup.sh script (Listing 1) may be used to
the directory structure and appropriate permission settings
basic onite system. I've chosen a master directory location
of /usr/spool/onite for the example implementation;
location may be more appropriate for your site. In those
applicable, the SPOOLDIR configuration variable identifies
the master onite directory.
Several subdirectories exist immediately beneath the
The subdirectory jobs itself contains another tier of
corresponding to the various job priority levels. The
system may be
configured for any number of priority levels; when there
levels of priority, the subdirectories are named P1
In scripts where applicable, the NPRIORITIES variable
the number of priority levels implemented.
The subdirectory stdout receives the intermixed, non-directed
("bit bucket") output of both the stdout and
streams for the last NTOLEAVE jobs that have been run
the spooler. The value of NTOLEAVE is configured in
driver script, onitego.sh.
The subdirectory jobsdone receives the "used"
for the last NTOLEAVE completed jobs. The contents of
directory, along with the contents of stdout, as previously
noted, exist primarily to support post-mortem analysis
by the system
The onitego.sh script emits a log of all overnight spooler
activity on its standard output and error streams. I've
configured the log file to record this output in /usr/spool/onite/onite.log.
The log file is created with the proper permissions
by the installation
script, setuponite.sh, but no other scripts explicitly
to the log file. With the following line in the "root"
0 20,23 * * *
the output of the master driver script is appended onto
the end of the log file every time the master driver
A brief description of each individual script and auxiliary
the onite package follows.
The Configuration Script
onitesetup.sh (Listing 1) initializes the directory
for your custom implementation of the onite system.
lines 15-18 for your system; line 14, the debug flag,
may be used
to create a "dummy" hierarchy in the current
testing purposes. To test the onite system using this
directory, copy all the scripts into your testing directory
the initialization of debug to Y in all scripts where
debug appears. This is especially useful once the system
been officially installed and you wish to test some
without corrupting the currently active code and job
The Master Driver Script
onitego.sh (Listing 2) invoked from the cron table,
as shown above, "wakes up" to execute all
job scripts in sequence. It scans all the $SPOOLDIR/jobs/P*
directories in order, beginning with P1, looking for
and submits each job file encountered to the shell for
The standard output and standard error from each job
is written to
a file in the $SPOOLDIR/stdout directory with the same
as the job file. All program output from the job script
the form of explicit output files or physical output.
Any output emitted
through the stdout and stderr streams should be considered
for the system administrator's eyes only.
After the job has finished executing, the job file itself
to the $SPOOLDIR/jobsdone directory.
The standard output of the onitego.sh script provides
log of job activity. If no jobs at all were queued for
then a message to that effect is emitted. Otherwise,
creates a lock file that exists for the duration of
all job processing,
and, for each job, writes a message announcing the name
of that job
and the time it begins its run.
When all jobs have been processed, the fleave.sh utility
is called to delete all files in the jobsdone and stdout
directories except for those corresponding to the most
jobs. This keeps those directories from filling up with
too much junk.
There are some basic limitations to the design of the
system. The primary hazard is the case where a user
is permitted to
queue a job after the driver script has already begun
the evening. If the job is queued at a priority level
equal to or
greater than the priority level currently being processed,
job may not be run until the next night. I've partially
this issue by scheduling the driver script for two runs
so that a job missed during the "first round"
is picked up
for execution in the "second round." This
assumes that all jobs from the first round are completed
scheduled time for the second round comes up; if the
of the driver script is still running when the later
up," the later instance will see the lock file,
and go back to sleep. Also, a high-priority job that
ends up running
in the second round will effectively have been bumped
down to the
lowest possible priority, since all jobs from the first
by then have already completed. In other words, if the
are really critical, then don't schedule the master
for more than one run per night.
The best way to prevent these kinds of conflicts is
to make sure no
jobs are queued past the time when the first instance
wakes up (see the discussion of spoolonite.sh below
built-in protective measures).
"Run Driver NOW" Script
From time to time, you might discover that onitego.sh
executed as normally scheduled. For instance, someone
may have inadvertently
broken the root cron table entry while doing administrative
maintenance, or perhaps the system had experienced a
spooler startup time and hadn't been brought back up
until after the
startup time, so cron never had a chance to start the
onitenow.sh (Listing 3) is designed for one-shot invocation
by the system administrator in just such an event. The
starts up the master driver immediately as a background
to hang-up, and sends the output into the appropriate
The Job Queuing Script
The last of the major scripts in this package, spoolonite.sh
(Listing 4), schedules an overnight job for execution.
is typically run from within a shell script, accepting
of the job to be spooled on its standard input stream.
There is only
one mandatory command line parameter, the job name,
and one optional
parameter, the job priority level. If no priority level
then the job is assigned a priority of $DEFAULT_PRIORITY
defined in the script.
The two variables USE_CUTOFF and CUTOFF_TIME may be
configured to reject job submissions past a particular
time of day.
If USE_CUTOFF is Y, then any attempt to queue a job
after the clock time specified by CUTOFF_TIME will be
The variable CHECK_LOCK may be configured to reject
once the nightly queue has begun executing; this, in
the USE_CUTOFF mechanism, effectively eliminates the
of "orphaned" jobs in the queue after the
master driver script
has completed its run (lines 49-57).
Since the contents of the stdout and jobsone directories
are not broken down by priority level, only one instance
of any specific
job name is allowed per night (lines 55-65). It is left
up to the
system administrator, using the tools provided in this
as oname.sh), to construct unique names for all job
Since the master driver script is invoked from root's
table, all jobs are actually run under the root's user-ID
not under the user-ID and environment of the invoking
spoolonite.sh must see to it that the original user's
is replicated as faithfully as possible at the time
job script is run.
Line 79 begins to construct the job file by dumping
the entire contents
of the user's environment settings into it. Line 78
prevents a nasty
problem in the case where the user's PS1 (primary prompt
was exported and happens to contain a multiline string.
If PS1 were
not redefined in this case to isolate the embedded newline
a set of quote marks, then the shell would become confused
multiline string when the time came to interpret the
job script. If
there are any other variables in your user's environments
conceivably be set to multiline string values and then
variables must be redefined in a similar manner before
line 79 executes.
If any programs invoked from a user's job script need
access to any
variables in the user's environment, then those environment
must be exported by the job script. The design of this
that "unsophisticated" users will not be creating
custom environment variables and spooling jobs for overnight
that depend on those variables. Sophisticated users
can include the
commands to define and export such variables, if necessary,
own when preparing their scripts.
When the list of common critical environment variables
is known, however,
then that list may be specified as the value of toexport
29). For our installation, this list includes the PATH,
relating to database configuration, and two that affect
routing. I know these variables are defined in every
profile, because I maintain those profiles.
In line 83, spoolonite.sh generates a cd statement that
sets the current directory for job execution to the
current directory. Finally, the explicit job script
text is copied
from the standard input onto the end of the job file.
Displaying the List
showonite.sh (Listing 5) summarizes all jobs queued
processing, showing the job name, name of the invoking
user, and priority
level. The contents of each priority directory are displayed
the output of the l command to awk for formatting.
Cancelling a Job
A user may change his/her mind about an overnight job,
and need to
cancel it. killonite.sh (Listing 6) performs that duty.
may be configured to restrict users to killing only
their own jobs,
or to allow users to kill anyone's queued jobs, depending
value of the OwnOnly variable (line 9).
This script uses the utility script lpick.sh, described
to let the user pick a job "by number".
Looking for a Particular Job
It may not make any sense for certain kinds of jobs
-- for example,
a process that checks a mailing list for illegal addresses
a monthly mailing -- to be run more than once per night.
requests such a job for the second time in a single
day, it can only
be because they didn't realize someone else had already
it. isonite.sh (Listing 7) helps the system administrator
such duplications. Given a job name as the command-line
it returns a true status if a job by that name has already
Generating a Unique Name
When it makes sense for a certain type of job to be
multiple runs in one evening, each instance of that
job must still
be given a unique job name. The oname.sh script (Listing 8)
is a simple inline tool for generation of unique file
names; it uses
the tmpname.c program described below to generate a
in the system /tmp directory, then chops off the /tmp/
prefix to return just the base file name on the standard
For example, to generate a unique job name for an instance
of a report
identified as ren, I might use:
General Utility Programs and Scripts
All the scripts described above were written specifically
Overnight Spooler system. The short scripts and C programs
in this section are general-purpose tools used by many
of our shell
scripts, including the onite system.
checknum.c (Listing 9)
This C program examines its first command-line parameter,
the leading portion of it into a number value, and returns
number alone on the standard output. If the parameter
leading numeric component, the string ERROR is returned
and the script terminates with an error status of 1.
is used by spoolonite.sh and onitego.sh.
tmpname.c (Listing 10)
tmpname.c simply extends the functionality of the tempnam()
C library function to create a tool available for use
a shell script. For example, the following command creates
file name in /tmp that begins with the characters "abc":
pick.sh (Listing 11)
Given a text file containing a list of items to select
from and a
generic description of the flavor of item being chosen,
describes, sequentially numbers, and displays the list,
for the user to select one of the items according to
sequence numbers. The user may either enter a sequence
number to make
a selection, or press the return key alone to indicate
If the user makes a selection, lpick.sh returns the
the selected item on the standard output; else, the
is returned. killonite.sh uses lpick.sh for prompting
the user to select a job to cancel.
fleave.sh (Listing 12)
onitego.sh calls this utility script to clean out old
in the jobsdone and stdout subdirectories.
ask.sh (Listing 13)
This little script prompts the user with a given text
upon a y/n response, and returns Y or N accordingly
on the standard output.
A Report Queuing Example
Listing 14 shows an example script that spools a user-requested
program as an overnight job. This script, invoked from
a menu system
in our case, prompts the user for a publication code
(using the getmag
shell tool) and proceeds to set up a job that runs a
set of mailing
address consistency checks for the specified publication.
internal shell tools, such as magname and nissue, appear
in the script, but their use is related to the specific
and not to the spooler system in general.
The job text is first written to a temporary file, then
file is fed to spoolonite.sh in line 49. After return
spoolonite.sh, the temporary file is deleted.
A Periodic Job Spooling Example
Earlier I mentioned the problem of backup scheduling
spooling the backup routine as the lowest-priority overnight
all potential concurrency issues can be avoided, and
it is guaranteed
that the backup program doesn't run until after all
have completed their tasks.
Say you have a backup driver script named dump.sh that
the physical backup operations, and you're currently
calling it directly
from the cron table at some fixed hour of the night.
this task into a spooled overnight job, create a special
spool the dump.sh script as an overnight job. Such a
named spooldumps.sh, is shown in Listing 15.
Then, in your cron table, simply change the line that
to call dump.sh to call spooldumps.sh instead, some
time before the nightly onitego.sh run is scheduled
For example, here is the root cron table entry from
30 18 * * 1-5 /usr/local/spooldumps.sh
This causes the spooldumps.sh script to execute
every evening at 6:30 P.M. (our onitego.sh is scheduled
to start up at 8:00 P.M.). spooldumps.sh schedules the
dump.sh process (which resides in the /u3/Backup directory)
at priority 7, the lowest priority. Thus, the dump.sh
script is the last program to execute every night.
About the Author
Leor Zolman wrote BDS C, the first C compiler targeted
for personal computers. He is currently a system administrator
software developer for R&D Publications, Inc., and
columnist for both
The C Users Journal and Windows/DOS Developer's Journal.
Leor's first book, Illustrated C, has just been published
R&D. He may be reached in care of R&D Publications,
Inc., or via net
E-mail as firstname.lastname@example.org ("...!uunet!bdsoft!rdpub!leor").