Weeding Your System
Larry Reznick
Log files and temporary files are like the weeds in
your yard: if
you don't take care of them, they take over. A number
of standard
UNIX files accumulate during the normal operation of
the system. You
may also find that specialized applications at your
site create files
that accumulate over time. If you don't pay attention
to those files,
they'll take over your filesystem.
The system handles many log files automatically; others,
it either
doesn't handle very well or doesn't handle at all (the
printer request
file comes to mind as an example of one that isn't cleaned
up automatically
but should be). Fortunately, the system regularly creates
and updates
many of these files in known locations. That regularity
makes them
good candidates for cron(1) jobs to do the cleaning.
If you check root's crontab(1) (using the "crontab
-l"
command while you're logged in as root), you may find
one or two jobs
related to cleaning up the files. For example, SCO SVR3.2v4
has /usr/lib/cron/logchecker,
and /usr/lib/cleantmp. These two shell scripts check
various
files and eliminate, prune, or age them.
The simplest solution is to eliminate the file. Usually
the application
writing to the file will recreate it. One flaw with
this solution
is the data isn't around any more when you're giving
a problem a postmortem
exam. Another solution is to prune the file, keeping
only the most
recent records. Newer records are typically added to
the end of the
existing file. Periodically stripping the older records
at the top
of the file reduces the file's size while still keeping
the records
that could aid in a postmortem. Finally, you could age
the file. Copy
the existing file to a file with a similar name or to
another directory
specially set up to hold these aged files. Then recreate
the file
expected by the application. Once finished, the entire
past log is
present for your postmortems until the next aging time.
Eventually,
aging will remove the oldest version.
Logchecker
This script checks the size of the cron log file. If
you want
to keep track of cron's doings, look for a file named
/etc/default/cron.
You can tell cron to keep a log of everything it does
by setting:
CRONLOG=YES
Usually, CRONLOG is set to NO, which doesn't
keep the log.
The log file is named /usr/lib/cron/log. If you've been
keeping
the cron log and haven't been monitoring it, you should
decide
whether it is worth keeping. If you're going to keep
it, the logchecker
script provided by SCO ages the file for you.
Cron runs logchecker twice a week. I have it set to
run at 1:03 AM on Sunday and Thursday. So, the crontab
entry
looks like this:
3 1 * * 0,4 /usr/lib/cron/logchecker
When you're root, changes you make to an existing file
don't affect the file's ownership or group settings.
If you create
a new file, though, the file gets root's permissions,
ownership, and
group. The logchecker script sets the umask to 022,
taking away write permission from both the group and
other users.
Logchecker allows the cron log file to grow as large
as four blocks short of the ulimit. You can set a variable
to this limit as follows:
LIMIT=`ulimit`
LIMIT=`expr $LIMIT - 4`
Before going any further, the script tests whether the
log file exists. If not, the script simply quits. Given
that the log
file exists, use du(1) to find its block size. If the
block
size is greater than $LIMIT's value, copy it to a file
named
olog. The olog file is kept in the same directory
as the log file. The log file is recreated by using
the command:
>log
This creates an empty file named log and gives
it the permissions set in the umask and root's ownership
and
group. The script changes the group to bin.
Using a ulimit-based high-water-mark test is not a bad
idea.
One problem with this test is that some systems have
a very large
ulimit, which allows this file to get ridiculously large.
Four blocks leaves about 2Kb of space for the cron log
to
grow before crashing into the ulimit. Presumably, cron
will execute logchecker again before the cron log
uses that space. But, with a large ulimit, this is just
taking
up too much space, and logchecker may not act before
the file
grows to unmanageable proportions. One of the following
solutions
could help:
1. Lower the high-water-mark by changing the four to
a larger
number.
2. Force a constant block-size-maximum limit into the
test.
3. Use the ulimit command in the script to temporarily
lower the ulimit. This applies only to the run of this
script.
You can apply the third technique without changing the
script at all.
Set the ulimit within the cron entry just before executing
the logchecker script:
3 1 * * 0,4 ulimit 1024; /usr/lib/cron/logchecker
Using 1024 cuts the log file off at just under half
a
megabyte (remember that the high-water-mark will subtract
four from
the ulimit). That seems like a workable size; large
enough
to be ignored for a while, but not so large that you
can't wade through
it if you have to. With one aged olog file and the active
log file online at once, this setting wouldn't use more
than one megabyte
of space total.
Cleantmp
I've set cron to run cleantmp every day at 1:06 AM.
The crontab entry looks like this:
6 1 * * * /usr/lib/cleantmp > /dev/null
Notice that this time is shortly after when logchecker
executes. I schedule most of the cleanup work during
the 1:00 AM
hour just before the system backup, which runs at 1:45
AM.
Cleantmp doesn't age the temporary files. It eliminates
them based on a time limit. Cleantmp uses a file named
/etc/default/cleantmp, which contains two variables
telling
the script how to act: FILEAGING and TMPDIRS. FILEAGING
identifies the number of days old a temporary file can
be before
it qualifies for removal. TMPDIRS tells which temporary
directories
to apply cleantmp's actions to.
Edit the /etc/default/cleantmp file to set the FILEAGING
and TMPDIRS variables the way you want. You may want
to clean
up only files older than 7 or 14 days. Of course, if
your system has
lots of temporary files created by users running applications,
you
may want to relieve the burden on the filesystem more
frequently,
such as every three or five days.
Change the TMPDIRS variable to add any extra directories
you
need to the default ones already there. Frequently,
system administrators
move temporary files out of the /tmp directory into
/usr/tmp.
This takes the burden off the root filesystem, which
is often a small
partition on the boot drive. If all of the available
workspace on
the root partition is used by the /tmp files, the entire
system
can be brought to its knees. The /usr/tmp directory
is often
on another partition -- or, even better, on another
drive. If you've
configured certain applications to put their temporary
files in some
other directory, add that directory to the /etc/default/cleantmp
TMPDIRS list.
Cleantmp reads the /etc/default/cleantmp file using
sed. It extracts the FILEAGING variable's value with
the following regular expression:
/^FILEAGING=[0-9]*$/s/FILEAGING=//p
This looks for FILEAGING (notice that it must
be spelled with all capital letters) at the beginning
of the line,
followed by an equals sign, then followed by zero or
more digits,
which make up the rest of the line. If that search is
successful,
it substitutes the "FILEAGING=" part with
nothing, then prints
the result. Substituting that way deletes everything
but the number
following the equals sign. A cleantmp shell variable
gets
the number.
The TMPDIRS search uses a slightly different regular
expression:
/^TMPDIRS=[/]\{0,1\}.*$/s/TMPDIRS=//p
Searching for the all-caps variable name is the same
as before, but what follows the name differs because
that isn't expected
to be a number. The bracketed slash is a readable way
to isolate the
slash character to prevent sed from interpreting it
as the
end of the search string. Another way to do this is
by escaping the
slash ("\/") but that makes the expression
difficult to read.
The escaped braces surround a comma-separated range
for a repeat count.
So, the combination of the slash and the repeat count
allows zero
or one slash; that is, the slash isn't required, so
the directory
could be a subdirectory of the current directory. That's
not a very
good idea, though. It is best to use absolute pathnames
when using
rm. At any rate, the rest of the regular expression
matches
any character. So, the cleantmp shell variable gets
the list
of directory names.
The rest is simply a loop through the directories named,
checking
to see that each name is a directory. If it is a directory,
the script
runs a find(1) command to rm all files older than
the FILEAGING days. Younger files are left alone. If
there
is no such directory, cleantmp mails root an error message
that the directory either doesn't exist or isn't mounted.
Cleanup
Another general purpose cleaning program SCO provides
is /etc/cleanup.
Cron runs this script only on Sundays at 5:17 AM. Here's
the crontab entry:
17 5 * * 0 /etc/cleanup > /dev/null
The backup runs on Tuesday through Saturday after
logchecker, cleantmp, and other cleaning jobs have
finished. Saturday and Sunday, the company is closed,
so there
isn't any need for backup on those days. Because the
system is quiet
on Sunday, this is a good time to run other cleaning
jobs.
Cleanup processes several files: /usr/adm/sulog, /etc/log/filesave.log,
/etc/wtmp, and core files. su(1) maintains the sulog
file. An /etc/default/su file contains several settings
controlling
how su works. Of those, the SULOG variable gives the
full pathname for the file that logs every use of the
su command.
If SULOG has no entry, no sulog file is kept. The
standard place for it is /usr/adm/sulog. Cleanup simply
copies sulog to Osulog, then reinitializes sulog
using the command:
> /usr/adm/sulog
This method keeps the original permissions, ownership,
and group.
volcopy(1) updates the filesave.log file. volcopy
copies the entire UNIX system from one device to another,
like-sized
device. Logs of its activities go to /etc/log/filesave.log.
Cleanup takes a little extra effort in dealing with
filesave.log.
First, it tests whether the file exists. If so, it moves
-- not
copies -- the file to Ofilesave.log. Then, cleanup recreates
the file using the command:
> /etc/log/filesave.log
Unlike the sulog file, this file ceases to exist.
Recreating it this way gives the file root's permissions,
ownership,
and group. Cleanup must execute three more commands
to fix
this: chown root, chgrp sys, and chmod 666.
The wtmp file (look up utmp(4)) holds user facts,
including the login name, login device, login pid, and
login time.
These facts are accumulated for each login and are used
by several
programs, including who(1). This file can get very large.
The cleanup script doesn't try to preserve older versions.
It simply eliminates the old version with the command:
> /etc/wtmp
As before, this simple command clears the data but keeps
the permissions, owner, and group settings.
find looks throughout the file system for any core file
with
an access time older than seven days. Presumably, nobody
would still
be debugging such old core files. If it finds any such
files, it removes
them.
The cleanup script is an excellent place to put any
special
cleaning tasks you might have. For example, the /usr/adm/messages
file is a top candidate for pruning. This file receives
all system
error messages, including bootstrap messages. dmesg(1)
shows
the error messages and concatenates them to the /usr/adm/messages
file. Alternatively, /dev/error, a read-only device,
holds
these messages until they're read. The /etc/rc2 script
usually
runs dmesg or reads from the /dev/error whenever the
system is booted to init(1) level 2. Crontab may contain
a dmesg command to periodically dump the error messages
to
/usr/adm/messages.
The messages file is great for discovering all kinds
of nasty
problems. Check it often. However, this file can grow
to immense proportions
if not pruned periodically. I added the following commands
to the
cleanup script to prune messages:
tail +`expr`wc -l </usr/adm/messages\` - 1000` /usr/adm/messages >/tmp/msgs
mv /tmp/msgs /usr/adm/messages
chmod 644 /usr/adm/messages
chgrp bin /usr/adm/messages
chown bin /usr/adm/messages
The tail(1) command is the tricky part. I want
to keep the most recent 1000 lines in the file for system
debugging.
Because new messages are concatenated, you'd think I
could just use
tail -1000 on the file. No such luck.
It turns out that tail's -arg has an internal 4096-byte
limit. While that may not be so bad for most uses of
tail,
it prevented me from pruning this file the easy way.
Tail
also has a +arg that says "take the end of the
file starting
from the +arg line number." That'll do the trick
if I
just figure out what line number that is.
The wc(1) program counts the lines in the messages
file. Because the filename is output along with the
count when the
file is named on the command line, I deliver the data
through redirection,
for which no name is known or printed. Then, I pass
the line count
to expr(1), which subtracts 1000. That delivers the
line number
for tail to start with. I place the last 1000 lines
into a
temporary file, overwrite the existing messages file
with
the temporary, and finish by adjusting the permissions,
owner, and
group.
Purging Old History Files
Some applications create special files that aren't much
use after
some time elapses, yet they hang around on the system
taking up precious
space. For example, for one application I wrote software
that produces
work orders. Every work order is sent to the printer
after all of
the data is assembled. The software allows the client
to reprint past
work orders, so the work orders are actually produced
into a file
named after the work order invoice number, and then
sent to the printer.
All of those files collect in a specific directory,
called REPRINT_DIR.
Reprinting is only a matter of sending the file to the
printer again.
After a while, many old work orders have accumulated
on the system,
cluttering up the REPRINT_DIR. Only 140 of the files
appear
on the software's screen. Why bother keeping more than
140? So, I
wrote a script to clean them up. The primary code looked
like this:
cd $REPRINT_DIR
if [ `ls | wc -w` -gt 140 ]
then
find . -type f -mtime +14 -print |
xargs rm
fi
The test checks whether there are more than 140 files.
Filenames output
by a simple ls can be counted by wc as one word per
filename. If that count is greater than 140, the code
kills the oldest
ones. At first, I decided that any file older than 14
days was probably
useless. The names of these files were passed to xargs(1),
which invoked rm for all of them at once. (This method
is
more efficient than using find's -exec option, which
would execute rm once for each file instead of once
for all
of the files.) However, a beta site showed that this
method wasn't
a good idea.
It turned out that one of the beta test sites generated
so many work
orders in a day that they'd run over the 140-workorder
limit in two
or three days. By 14 days, there were more files than
anyone cared
about. At that point, we decided to keep only the most
recent files.
There actually was no need to reprint the work orders
beyond a couple
of days of the initial work order anyway. I replaced
the find
command line with the following:
ls -t | tail +140 | xargs rm
The -t option delivers a time sort with the most
recent files first. Because I didn't use -l, only the
names
appear in a single-column list. Tail cuts that list
off starting
at line 140, which is the 140th filename. It delivers
all of the names
at the end of the list to xargs, which removes those
filenames.
The first 139 names remain, which are the 139 most recent
work orders.
I put the script that purges the old work order files
into the crontab
to run once every day. That way, the previous day's
files push off
the equivalent number of older files.
General Purpose File Aging
All of the techniques shown so far work for many kinds
of files strewn
throughout the system. But these techniques require
specific changes
to specific files or require other special treatment,
as the work
order files example did. The techniques are worth knowing
and can
be applied in many scripts, but I wanted a script that
I could use
to age many files.
My objectives were to:
1. name the files that needed aging on a script's command
line,
2. be assured that copies of the file would go to a
known location,
3. replace older versions of the file with more recent
versions,
4. maintain the original file's permissions, ownership,
and group
settings on the older versions and the current version.
I didn't care to truncate the original file as part
of the aging process,
although I could add such an enhancement if necessary.
I envisioned
this aging process as an online file backup for quick
problem recovery.
For example, you mustn't truncate database files or
the password file.
Yet, if a problem appears in the file, I might resolve
it more quickly
from an online copy of the file than from a copy found
and extracted
from tape. The tape is still available in the event
of major catastrophes.
This method wouldn't work well for aging extremely large
files because
of the disk space required, but small to medium-sized
files won't
use enough disk space to cause a problem. You specify
which files
to age by naming them on the script's command line.
The script doesn't
care about the size of the file. The dupback program
in Listing 1
implements this idea.
The aged files will be placed in /usr/local/backup.
If that
directory doesn't exist, the script creates it. mkdir's
-p
option creates every missing subdirectory along the
way, in case any
of them is missing. Notice the logical OR operator ("||").
If the test -d fails, the first part of the OR is
false, so the second part must execute to complete the
OR
condition. If the test -d succeeds, which means that
the directory
already exists, the OR doesn't bother executing the
mkdir
in the second part.
The FILES variable receives a copy of every filename
on the
command line. These filename arguments must be saved
because the set
command inside the for loop will replace the positional
parameters.
Inside the for loop, each filename coming from the command
line is stored in a variable named f. The set command
takes the output of ls -li, which shows the long listing
of the current file in $f. From the long listing, the
program
can figure out the file's owner and group settings.
I couldn't simply
use ls -l, though. When the subshell is finished executing
the ls -l, it delivers the output to the set command.
Because this dupback program is intended to work with
regular
files, the permissions output will have a hyphen ("-")
for
the mode setting. Thus, the permissions, such as "-rwxrwxrwx",
will appear to the set command as an option list. To
prevent
that, I added the -i option, which causes the inode
number
for the file to output. Because the inode always appears
first, it
prevents the permission settings from appearing as a
set option
list, at the trivial cost of shoving the owner and group
names into
the next positional parameter. So, $4 has the owner
name and
$5 has the group name.
An inner for loop drives the aging process. I want to
create
a duplicate of each original file using the same name
but with the
suffixes .1, .2, and .3 added. When doing the next aging,
the .2 file
overwrites the .3 file, the .1 file overwrites the .2
file, and the
current file overwrites the .1 file. To preserve the
oldest files,
the loop numbering goes backward from 2 to 1. The loop
counter doesn't
need 3 because an expr calculation in the mv command
creates it.
The code first tests whether the file about to be mv'd
exists.
If it doesn't exist, mv gives an error message. This
test
uses the logical AND ("&&") to be
sure that mv
is executed if, and only if, the file exists. If the
test fails, the
entire AND is considered false, so mv won't execute.
Once finished, the loop has turned file.1 into file.2,
and file.2
into file.3. At this stage, there is no file.1 anymore.
Keep in mind
that mv keeps the original permission, owner, and group
settings.
All that remains is to copy the original file using
the file.1
style name. The gotcha is that cp creates a brand new
file,
which means that the permissions, owner, and group will
be set according
to the user executing the program.
After the cp, the permissions, group, and owner, in
that order,
are set. The original file's group and owner settings
are retrieved
from the ls output stored in the positional parameters.
The
permission settings are more difficult to acquire. I
could have gotten
this information from a C program, but I really wanted
to do this
entirely in shell script. So, I wrote an awk program
to translate
the ls -l permissions field into the octal format usable
by
chmod. Listing 2 shows the getperm script that does
this translation. Because getperm was written to provide
the
octal permission value and the filename, getperm's output
supplies everything that chmod needs.
When dupback is finished, the original file still exists
in
the original location. It is duplicated in /usr/local/backup/filename.1.
The previous filename.1 is now named /usr/local/backup/filename.2,
and the previous filename.2 is now named /usr/local/backup/filename.3.
A cron job can run dupback with a command-line list
of files to be aged like this. The next time something
goes wrong
with one of those files, you can fall back on the aged
copies before
resorting to the backup tape. Combining this with the
other techniques
for eliminating, pruning, or aging files, you should
find little difficulty
keeping the weeds from taking over the system.
About the Author
Larry Reznick has been programming professionally since
1978. He is currently
working on systems programming in UNIX and DOS. He teaches
C language courses
at American River College in Sacramento. He can be reached
via email at:
rezbook!reznick@csusac.ecs.csus.edu.
|