Setting Priorities
Larry Reznick
nice(1) is nice. By prefixing a command line with the
word
"nice," SVR4 users can reduce the impact of
their programs
by 10 levels of priority. If users don't run nice, the
system's
default priority, which is set to level 20 out of 40
total levels,
applies. Increasing the niceness of user programs by
10 can
make a real difference to system performance. But just
try to get
your users to do that regularly.
renice(1) is nicer. This BSD extension to SVR4 lets
users
change the priority of their own jobs. Only the system
administrator
can reduce a job's niceness, making the job take more
of the system's
attention. But users can make their own jobs nicer to
the system by
setting one of the 20 niceness levels for any or all
of their processes.
renice has no default priority, though. Users must explicitly
name some priority change amount to apply to all of
the process IDs
(pids) named. Again, this doesn't happen automatically.
It's a dirty
job and the system administrator is probably going to
get stuck with
it.
On SVR4, renice's work is done with priocntl(1). As
with renice, all users can apply priocntl to their
own jobs, but only a root user can apply it to any job.
priocntl
has more control over process scheduling than nice or
renice.
System administrators can also change the scheduler's
priorities for
handling all jobs using dispadmin(1M).
Some of this process priority information is available
from ps(1).
SVR4's version of ps shows detailed information when
used
with the -elf option. However, two other options show
additional
ps information useful for priocntl: -j gives process
group ID numbers (pgid) and session ID numbers (sid),
and -c
gives scheduling classes and global priority levels.
Figure 1 shows
the information generated when these two options are
added to the
-elf option. Notice that -elf shows the PID (process
ID), the PPID (parent process ID), the C (CPU's utilization
percentage),
the PRI (process priority), and the NI (nice number).
The priority
numbers in the PRI column show niceness -- that is,
the higher
the number, the lower the priority -- but on a larger
scale than
the NI numbers, which are constrained into the nice
command's
range.
ps's -j option adds the PGID and SID numbers between
the PPID and the C columns. These numbers are useful
in priocntl
when you want to display (priocntl -d option) or set
(priocntl -s option) priorities by specific group or
session IDs (add the -i pgid option or the -i sid option
to priocntl's command line).
ps's -c option is more significant when using priocntl.
Using it replaces the C column with a CLS (class) column
and changes
the form of the PRI column to show the system's global
priority levels.
The CLS column shows abbreviations for the three job
scheduling classes.
Global priority levels in the PRI column are very different
from niceness
numbers in two ways. First, the numbers show priority,
not niceness
-- that is, the higher the global priority number, the
higher the
priority. Second, the global priority numbers go very
much higher
than any niceness numbers. Figure 2 shows an abbreviated
sample output
of ps -elf, while Figure 3 shows a similarly abbreviated
sample of
ps -elfjc. In those figures, the ADDR, SZ, WCHAN, STIME,
TTY,
and TIME fields have been cut.
Job Scheduling Classes
While many system administrators are familiar with niceness
levels,
not all may be familiar with global priority levels.
These levels
are configurable in the kernel. On SVR4, jobs are in
one of three
classes: time-sharing, system, and real-time. Altogether,
these three
classes comprise the 160 default global priorities.
Time-sharing processes are the typical processes run
by every UNIX
system. These processes vary their priorities, sharing
their use of
the CPU with other processes. Time-sharing processes
have the lowest
global priority levels, ranging from 0 to 59.
System processes are the processes run by the kernel,
such as those
run by init(1M) and configured in /etc/inittab(4).
These processes don't vary their priorities. User processes
are never
in the system class, even when they call the kernel
to do some work.
System processes have global priority levels ranging
from 60 to 99,
so the lowest system class priority level is higher
than the highest
time-sharing class process.
Real-time processes are so critical that they take precedence
over
system class processes. The lowest real-time priority
level is higher
than the highest system class process. Like the system
class does,
and unlike the time-sharing class, real-time processes
use a fixed
priority scheme. Global priority levels for real-time
processes range
from 100 to 159. Once a real-time process enters the
scheduler, no
other process -- not even a system process -- will get
control
again until the real-time process finishes or relinquishes
its time
slice.
/etc/conf/cf.d/mtune(4) contains several tunable parameters
for the scheduler. Figure 4 shows default settings for
some of the
scheduler's parameters. RTMAXPRI defines the real-time
class's maximum
priority within the class. TSMAXUPRI defines the time-sharing
class's
maximum user-settable priority, which users may change
using priocntl.
The value in the default column, 20, represents both
the minimum and
maximum applied, ranging from -20 to +20. MAXCLSYSPRI
identifies
the maximum number of system class priorities. RTNPROCS
and TSNPROCS
identify the number of process levels for the real-time
and time-sharing
classes.
Scheduling Priorities
Once a process has used up or voluntarily given up its
time slice,
the scheduler is free to give another process a time
slice. Ideally,
the system's CPU resources are spread evenly across
all jobs, but
the scheduler rewards nice jobs and penalizes piggy
jobs in the time-sharing
class. There is no reward and penalty scheme for the
system class
and the real-time class. The system class is off limits
to all users
-- even system administrators unless they're changing
the kernel.
priocntl gives users control over their own time-sharing
and
real-time processes, and gives administrators control
over all time-sharing
and real-time processes. dispadmin gets or sets a class's
priority tables, although only a root user can set the
tables.
Figure 5 shows the time-sharing table's data, as output
by the
command
dispadmin -g -c TS
Detailed information about the time-sharing dispatcher
parameter table
is in ts_dptbl(4). Each row represents one priority
level.
The first column in any row is the quantum, which is
the number of
time slices given to a process at that row's priority
level. The RES
value shown at the top of the dispadmin output indicates
the
resolution of the quantum column in fractions of a second.
The quantum
is set at 1000 by default, and the quantum column shows
milliseconds.
Each time slice is defined by HZ in /usr/include/sys/param.h
and is echoed in /etc/default/login. HZ is the real
resolution
of your system in clock ticks per second. The tables
may show millisecond,
microsecond, or even nanosecond resolution, but any
quanta with greater
resolution than HZ are rounded up to the next HZ value.
For example, my Esix SVR4 system and a client's SCO
system both
use HZ=100. On those systems, priority level 0 uses
1000 milliseconds
(1 second) and priority 59 uses 100 milliseconds (.1
second). If any
quantum showed less than 100 milliseconds or had a remainder
less
than HZ when divided by HZ, such as the 24 in 1024 because
HZ is 100,
the system would round it up to the next HZ increment.
So, 83 milliseconds
would round up to 100 milliseconds, 257 milliseconds
would become
300 milliseconds, and 1024 milliseconds would become
1100 milliseconds.
To see the resolution in HZ rather than in the default
milliseconds,
add the -r option, such as:
dispadmin -g -c TS -r $HZ
Using -r changes the resolution, and in this case
would set it to the HZ value.
Recall that the dispatcher table's priority level numbers
show higher
numbers for higher priorities. That means processes
running at 0,
the lowest priority in the 60 time-sharing levels, are
allowed the
most time, and processes running at 59, the highest
priority, are
given the least time. Processes that don't run often
are allowed to
run for a long time when they finally do run. High priority
processes
run frequently but briefly.
The second column identifies the priority level to use
the next time
the process gets its turn if its current time slice
expires. A Process's
time slice expires when the process runs without sleeping
and exhausts
its time. Other interruptions pause the process's clock.
Returning
from the interrupt continues the process's clock right
where it left
off.
Notice that in Figure 5 the expired processes have their
priorities
reduced by 10 levels if they're high-priority, but by
half if their
priority is lower than 20. Level 59, the highest priority
level in
the time-sharing class shown, gets a quantum of 100
milliseconds.
If a process at that level uses up all of its time without
sleeping,
next time the process will run at level 49, allowing
any higher priority
process to run first. At that new level, the process
gets a 200-millisecond
slice, but if the process uses all of that up next time,
it will be
reduced to level 39 (400 milliseconds) (not shown on
the abbreviated
table), then to level 29 (600 milliseconds), then to
level 19 (800
milliseconds), then to levels 9, 4, 2, 1, and finally
0 (all 1000
milliseconds each). Such a voracious process would get
big time slices,
but with decreasing frequency, so as to avoid dragging
the system
down.
The third column names the priority level assigned to
the process
if it sleeps. Processes voluntarily giving up their
time slices get
rewarded with a higher priority. When a process is blocked,
waiting
for I/O, this return-from-sleep column also applies.
In Figure 5,
sleeping processes with low priority have their priorities
raised
by increments of 10 levels. Starting with level 40,
priorities are
raised half the previous increment. So, sleeping processes
that initially
run occasionally, but with a large quantum, eventually
rise to become
frequently run processes, although with a small quantum.
A process at any priority level may have to wait for
other, higher-priority
processes. Column four sets the maximum time in seconds
a process
may wait for its turn to execute. If any process waits
longer than
this number of seconds, the process is compensated for
being so patient.
Column five contains the new priority level given to
the delayed process.
Typically, column five contains the same level numbers
as column three,
the sleep return column. In Figure 5, every priority
level has a five-second
maximum wait time. If any low-priority process is busy
waiting for
more than five seconds, its priority is increased 10
levels. The process
will not necessarily execute right away if there are
still many higher-priority
processes, but this change increases its likelihood
of getting executed.
If the process is again forced to wait longer than five
seconds,
its level will increase by another 10. Eventually, such
a process
will rise to a top priority and execute, falling or
rising from there
according to its CPU usage.
Use the command
dispadmin -g -c RT
to see the real-time dispatcher parameter table. This
table is described in rt_dptbl(4). An abbreviated version
of it is shown in Figure 6. This table is far simpler
than the time-sharing
table. As before, each row represents one priority level,
but there
is only one column. Real-time processes are explicitly
assigned one
level and they stay there unless someone manually changes
them.
Remember that the lowest priority real-time process,
level 0, is higher
than every other system and time-sharing process. Once
a real-time
process starts, little else on the system will get a
chance. At real-time
level 0, a process has a 1000 quantum. When that expires
without sleeping
or blocking, the scheduler will look to see if there's
another process
above or at the same level. If not, this same real-time
process will
execute again because no system or time-sharing process
is at a higher
priority. Thus, once any real-time process starts, it
drags the system
away from any other work. The down side is, of course,
that other
system work suffers. On the up side, real-time processes
get the full
attention of the CPU, and finish soon, as they should
-- otherwise
they shouldn't use real-time priority. Ongoing real-time
processes
shouldn't run on the same system as other time-sharing
processes.
If you want to tune your time-sharing or real-time dispatcher
parameter
table, use dispadmin -g to get the current table's settings
and redirect the output to a file. Edit that file to
use whatever
settings you prefer. Don't add any new priorities because
the new
table must have the same number of priority levels as
the original
table the -g option showed. If you want a different
number
of priorities, you must change the relevant tunable
parameters in
the kernel or in the space.c file in either /etc/conf/pack.d/ts
or /etc/conf/pack.d/rt, and then rebuild the kernel.
When
you're finished editing the dispatcher table file, use
the dispadmin
-s option.
For instance, if you want to edit the time-sharing table,
you might
execute:
dispadmin -g -c TS >ts_dptbl.new
After editing the file, you can set the new table
in the kernel's space with the command:
dispadmin -s ts_dptbl.new -c TS
Changing Priorities
The most important issue to decide when changing a process's
priorities
is probably the easiest to decide. Should the process
be a time-sharing
process or a real-time process? Typically, the answer
to that question
is time-sharing. But if you need to set an occasional
real-time process,
check whether your system is currently configured for
real-time processing.
Both dispadmin and priocntl can tell you this with
their -l options. If you use dispadmin -l, only the
class names appear. priocntl -l gives slightly more
information,
as shown in Figure 7.
If your system doesn't have the real-time class and
you want it, you
must remake the kernel. Edit /etc/conf/pack.d/ts/space.c
and
find a line reading
EXCLUDE:RT
SVR4 automatically includes the real-time class by default.
If someone
excluded it, find out why before you enable it again.
To enable the
real-time class, replace the exclude line with
INCLUDE:RT
Rebuild the kernel and reboot. The class should appear
in the next
dispadmin -l or priocntl -l option output. You can
do the same thing to eliminate the time-sharing class
for systems
that you want to dedicate to real-time processing, but
given real-time's
privileged priorities, there are not many reasons for
eliminating
the time-sharing class.
priocntl has three primary options: -d, to display
scheduling parameters; -s, to set scheduling parameters;
and
-e, to execute a command using specific scheduling parameters.
Anyone may display process parameters, but users can
set only their
own processes. Of course, the administrators may set
other users'
processes when they have root permission.
When displaying or setting scheduling parameters, using
an -i
option followed by a keyword identifies which kinds
of process information
to apply to the list. After the keyword comes a list
of ID numbers
associated with that keyword. The obvious keywords use
the pid or
the ppid, but pgid and sid come into play here also.
This is where
adding the ps -j option, which shows those values, may
be
helpful. Most of the time, select processes are changed
using one
of these keywords, but you could use the uid or gid
keywords to change
all of the processes associated with a particular user
or a group
of users. Use the ID numbers as they appear in the /etc/passwd(4)
or /etc/group(4) files, not the user or group names.
Use a -c option to specify RT or TS class. If you don't
use
the -c option, all of the ID numbers named must be in
the same
class. Each of those -c classes have special options
associated
with them. RT class options let you set the process's
priority level
and quantum. TS class options let you set the priority
level and the
user-changeable limit for the processes identified.
Using the -c
option, you can change a running process from one class
to another.
Finally, the -e option lets you execute processes in
either
the TS or RT class, applying other class options as
needed. This is
identical to the nice command's operation but you have
more
control. With the -s option simulating renice but exceeding
renice's control, the priocntl program puts all of
the scheduling operations into one place. Use priocntl
to
adjust priorities where necessary or launch programs
with the appropriate
priority. Programmers at your site may use the priocntl(2)
function to embed this identical control in their programs.
Judicious
use of priocntl and sleep will improve your system's
performance.
About the Author
Larry Reznick has been programming professionally since
1978. He is currently
working on systems programming in UNIX, MS-DOS, and
OS/2.
He teaches C, C++, and UNIX language courses
at American River College and at the University of California,
Davis extension.
|