Finding Disk Hogs
Larry Reznick
"Who's hogging my disk space?" Ignoring the
conceit of ownership
that we all share, that's what a group manager wanted
to know. He'd
written a script to tell him who was using disk space
on one directory
tree important to his operations. That was a good idea,
but I wanted
something more thorough. I wanted to know about the
space usage throughout
every disk drive. Furthermore, I wanted one script to
find this information
on both the HP-UX and the SunOS systems at the site.
dusage (Listing 1) collects disk usage information across
every partition on a set of systems. The script will
check a set of
systems specified on the command line or will check
a default set
of systems built into the script. You may give a megabyte
threshold
to reduce or increase dusage's output. Only users whose
space
usage exceeds the threshold are shown in the report.
You may specify
the threshold with or without specifying which systems
to check, but
to specify systems you must give a usage threshold.
Disk Usage Report
dusage's report shows the amount of space used by each
mounted
filesystem and the amount of space used by each user.
Figure 1 shows
a full report of the script's default systems, using
the 50 megabyte
default threshold. Figure 2 shows a report for only
one system using
a 20 megabyte threshold. The smaller the threshold,
the more users
who show on the report. dusage accepts a threshold as
small
as 1 megabyte, which shows nearly every user storing
files on the
disk.
dusage's report is designed not only for easy reading
by administrators
and managers, but also for easy parsing when feeding
its output into
other scripts for further analysis.
As a heading, the first line identifies the megabyte
threshold used
for the report. Two lines are skipped between each system's
report
and a single line is skipped between each disk drive's
report. Each
disk's report data is single-spaced.
System subheadings are preceded by four equals signs
(====) to simplify
extracting the report for one system from a larger report
containing
many systems. Assume that you've saved a large dusage
report
to a file by redirecting the script's output. You can
extract a single-system's
report by running
egrep -n ==== reportfile
This command shows a list of every system named in the
file and the line number where each system's subreport
begins. Say
that the report you want begins on line 53 and the next
subreport
begins on line 69. Use the command
tail +53 reportfile | head -16
to extract the 16-line subreport. If the subreport is
the last one in the report file, you don't need to pipe
the tail
program's output to the head program.
Each system subreport shows all the disk partitions
for that system.
dusage shows the names of the partitions' mount points
using
remote system notation:
systemname:/mountpath
Beneath the partition's designation is a line containing
space usage figures for the whole partition. These figures
come from
the df program, but they're scaled to use megabytes
instead
of df's usual kilobytes output. Total space usage lines
are
surrounded by parentheses to make extraction of these
lines simple
for scripts that want to analyze dusage's output.
The usage percentage is calculated the same way df calculates
it, but shows the value rounded to two decimal places.
df
rounds to the nearest integer. The total shown ("Tot:")
is
typically greater than the sum of the "Used:"
and "Avail:"
space because "Tot:" reflects certain space
used as overhead
by the system. The "Usage:" percentage reflects
the actual
space available from the "Used:" and "Avail:"
sum,
not the potential space as shown in the "Tot:"
field.
User usage lines follow the space usage line. These
lines show each
user whose usage exceeds the threshold. User lines show
in order of
most space used to least space used. Three tab-separated
columns show
the total megabytes used, the name of the user holding
that space,
and the percentage of total space used by the user.
UNIX allocates disk space 1K (two 512-byte blocks) at
a time. Total
megabytes used represents the total allocated space
in 1K allocation
units held by a user, not merely the sum of a user's
files' bytes.
Usage percentages come from the relationship of the
user's files'
usage to the partition's available space, which is the
sum of the
partition's "Used:" and "Avail:"
figures.
Disk Usage Script
The heart of dusage is the quot program. (Refer to
quot(8) on SunOS, and quot(1M) on HP-UX and on other
versions of System V). The -a option causes quot to
examine every mounted filesystem on a system and report
who is using
space and how much. quot is perfect for dusage's needs,
because quot's report gives the total 1K allocation
units
for each user in order of most used to least used.
Unfortunately, quot only gives this information when
run by
a root user. Therefore, dusage requires the user to
have an
effective UID of zero. If you need dusage's report frequently,
run dusage in root's crontab. Because dusage
sends its output to the standard output, cron automatically
mails the output to the crontab owner unless you redirect
that output to a file. If you want many people to read
that file,
be sure to set appropriate owner and group IDs and permissions
as
part of the cron job.
Listing 1 shows dusage, a Bourne shell script. dusage
has been tested to work correctly on HP-UX and SunOS
systems. It may
work unchanged on other versions of UNIX, but I've only
tested it
on those two systems.
SYSTEMS is set at the beginning of the script to make
this
variable easy to find and modify. The SYSTEMS variable
holds
the default systems that dusage reports about when system
lists are not included in its command line. Listing
1's settings are
dummies. Modify your SYSTEMS variable to hold the system
names
you're concerned with most frequently.
Because root privileges are required to run this script,
the PATH
variable is restricted to /bin, /usr/bin, and /etc.
These directories hold most of the programs dusage needs.
Other programs have their paths fully spelled out when
needed.
PROGNAME is set to the base name used to invoke the
script
so that error messages refer to the correct invocation
name. Later
script actions cause dusage to lose access to $0,
so I reserve its value now.
The next phase of the script introduces a method for
making this script
as portable across UNIX systems as possible. Because
of my initial
development requirements, this script runs on HP-UX
and SunOS. HP-UX
is System V-based and SunOS is BSD-based. There are
enough differences
between them to drive a portable script developer nuts.
The script
has to deal not only with program differences on the
executing system,
but also with differences on the remote systems from
which usage figures
are collected. The executing system may be different
from the remote,
examined system. Designing the script so that it could
collect HP-UX
figures even though run on SunOS, or vice versa, was
interesting.
Throughout discussion of the rest of this script, keep
in mind that
the executing system may run a different version of
UNIX than the
target system.
WHICHSYS initially holds the operating system name on
the
executing system, although later it will hold the name
of the target
system. The next test figures out which UNIX system
is executing by
looking at the first five characters. dusage assumes
that
if those five characters aren't "HP-UX," they're
"SunOS."
Only an if...else condition is needed for this test.
If you
find that simple assumption won't work for your needs,
change this
to a case statement. Make the case execute expr to
extract the five characters, just as dusage does. Look
for
HP-UX and SunOS using the usual case pattern matching,
and then add
patterns to match whatever else your UNIX systems require.
At this point, the script sets the remote shell command
(RSHCMD),
the awk command (AWKCMD), and the echo command
(ECHOCMD) variables. HP-UX uses remsh, but SunOS uses
rsh. I suspect HP-UX names the command differently to
avoid
conflict with the restricted shell's name. Nevertheless, System
V
uses rsh(1), so I view HP-UX as being different here.
However,
remsh is in dusage's PATH. SunOS presents
a problem by keeping its version of rsh in /usr/ucb,
which isn't in dusage's PATH.
dusage uses awk for its data collection, calculating,
and formatting. Some UNIXs keep the obsolete version
of awk
hanging around with the new, more thorough version of
awk.
SunOS does this. Most System V versions of UNIX I've
worked with do
this also, naming the older version awk and the newer
version
nawk. On systems I administer, I prefer to rename the
old
version of awk to oawk and then make a hard link from
awk to nawk. This keeps both versions available but
forces the more powerful, new version as the default
awk without
taking much extra system space. However, because I wrote
this script
to work despite someone having set that link, AWKCMD
points
to awk for HP-UX and nawk for SunOS, those systems'
respective versions of the more complete awk. Use additional
case pattern matches to direct your AWKCMD another way.
ECHOCMD points to the System V version of echo. HP-UX
automatically uses that version, but SunOS must go to
/usr/5bin
to get that version. System V's echo supports many embedded
control codes, using a C-like escape notation with a
backslash. BSD's
echo has no such support. dusage needs embedded newlines
to make its messages easier to read.
With these three primary support programs selected,
dusage
can move on to examine its command line. If ECHOCMD
isn't
set correctly, not even the usage message will output
portably because
it uses an embedded newline (\n).
THRESHOLD comes from the first argument on the command
line.
However, the user doesn't have to give that argument.
Without it,
THRESHOLD is set to 50. If you prefer some other default
threshold,
set the number of megabytes after the colon-hyphen (:-)
notation.
That threshold must be a positive number. Any negative
number or zero
is assumed to be an error and shows the usage message.
This way, if
the user starts dusage with a negative number threshold,
such
as "-50," or with a non-numeric argument,
such as "-h"
or "help," dusage tells the user how to use
it. Figure 3
shows an example of this help message.
For the rest of the script, the user must be a superuser.
Using "id
-u" to see if the effective UID is 0 prevents the
script from
going further. Figure 1 shows an example of this message.
Usage messages
and error messages are redirected to the standard error
device to
differentiate them from the report's normal output.
However, if a
cron job redirects dusage's standard output to a file,
cron still mails the standard error to the root user.
The next test checks whether systems were named on the
command line.
Because systems cannot be named without identifying
a threshold, the
test expects at least two arguments on the command line
for a system
name to be present. If no such names are on the command
line, the
default SYSTEMS value is set into the positional parameters.
When the command line contains system names, the threshold
is shifted
out of the positional parameters. Whichever way this
test goes, when
it is finished, the positional parameters contain only
system names.
Keying to the Remote System
Report generation begins by showing the current run's
threshold. For
each system named in the positional parameters, dusage
must
adjust itself to use certain programs on the remote
system. WHICHSYS
is reused to hold the remote system's OS name. Appropriate
df
and type commands are selected, according to whether
the remote
system is HP-UX or SunOS. On SunOS, df outputs a report
showing
disk free space in 1K units. HP-UX's df shows the same
information
in 512-byte blocks, but doesn't support System V's -k
option
to show it in 1K units. Instead, HP-UX has a different
program, bdf,
to show the same format as SunOS's df and System V's
"df
-k." Use a case statement to handle other operating
system
differences.
Discovering the proper path to certain commands became
more complicated
than the simple if...else used so far. On HP-UX and
many System
V installations, root users start up a remote shell
using either Bourne
shell (sh) or Korn shell (ksh). On SunOS, root users
start up C-shell (csh). Even though dusage runs under
Bourne shell because of the #!/bin/sh line at the top
of the
script, when a remote shell (remsh or rsh) starts,
the default shell on the remote system is not necessarily
the same
shell. That means the command to find the full pathname
of a program
differs according to the target system: /usr/ucb/which
on
SunOS's C-shell, and type on Bourne shell or Korn shell.
This different shell problem gets hairier on HP-UX.
According to the
remsh documentation, remsh's initial PATH
excludes the /etc directory, which some dusage commands
need. So, the TYPECMD used on HP-UX must contain the
explicit
PATH setting used by dusage, including /etc.
TYPECMD throws one more gotcha: type's output style
is different from which's style. Fortunately, the TYPECMD's
output format always shows the pathname last. Pipe the
TYPECMD's
output to the AWKCMD to extract the last field's value,
no
matter how many fields there are. This delivers the
full pathname.
DFCMD is set to such a full pathname for the remote
system's
command, be it bdf or df. USECMD is similarly
set to the usage command, quot. An extra quoted string,
"
-a" with a space before the hyphen in the quotation
marks, follows
the output of the shell substitution so that quot will
show
results for all mounted filesystems.
Collecting Remote System Statistics
Once all the target system configuration work is done,
dusage
can begin collecting disk usage figures for the remote
system.
Skipping two lines, dusage announces the system name
for the
subreport. It then remotely executes the appropriate
USECMD
and pipes USECMD's output into a local copy of AWKCMD.
AWKCMD collects and formats the figures.
Two kinds of output lines come from USECMD (quot):
an initial line identifying the current disk partition,
and a set
of secondary lines identifying the users taking space
on that partition.
Relatively little work goes into handling the secondary
lines. Much
more work goes into developing the partition headings
for the report
from the initial lines.
Initial lines from quot identifying the current disk
partition always refer to the physical device pathname.
Because the
dev directory is always part of that pathname, despite
any
subdirectory classifications, seeing if the line contains
dev
seemed most portable. So, the AWKCMD pattern
$0 ~ /dev/
looks for those three letters -- slashes surround
regular expressions in awk patterns -- and executes
the
action associated with a partition's initial report
line.
quot delivers its partition identifier showing the device
pathname and then, in parentheses, the mount point pathname.
If you
prefer to see both, use $0 in printf()'s second parameter.
However, assuming that most managers analyzing the report
may not
care about the partition's device name, I extracted
the mount point
pathname from $0. To extract the mount point, set AWKCMD's
match() function to find all the characters within parentheses.
Those parentheses needed to be double-escaped -- using
"\\("
not just "\(" -- because HP-UX's awk and SunOS's
nawk use differing regular expression rules. Doubling
the
escape character eliminated the difference.
AWKCMD's match() function loads the global variables
RSTART and RLENGTH. These hold the relative starting
position and the relative length of the matched string.
Used with
the AWKCMD's substr() function, those global variables
deliver the mount point pathname contained in quot's
initial
line. Because the regular expression includes the parentheses
in the
match, I add one to RSTART to skip the open parenthesis
and
subtract two from RLENGTH to ignore both the open and
close
parentheses. That extracted name is formatted into dusage's
output
showing remote system notation. The report skips a line
before and
after this output.
Finding the remote system's total space, as reported
by DFCMD,
requires separate remote shell execution. Recall that
the remote shell
invocation of USECMD has already delivered its output
to this
AWKCMD. Another RSHCMD won't interfere with that output.
Retrieving this new RSHCMD's output from the DFCMD
from inside the AWKCMD is a little tricky. It not only
requires
the RSHCMD and the DFCMD, but also requires DFCMD's
output to pipe through egrep. egrep extracts the only
relevant line from DFCMD's output. To make AWKCMD
execute RSHCMD and retrieve the output, I pipe the command
into getline.
AWKCMD's sprintf() function creates a string from
the several variables that form the remote shell DFCMD
with
the egrep pipe. This string is formatted and assigned
to the
sysspace variable. Expanding the string in front of
the pipe
to getline runs the command and delivers the output
to AWKCMD.
Because of egrep the command only outputs one line.
getline
assigns each word from the output line to AWKCMD's positional
parameters. Doing this replaces USECMD's input line,
but the
program is finished with that. Total, used, and available
space figures
come from USECMD's output. dusage converts them from
DFCMD's kilobytes format into megabytes, calculates
a maximum
space value for the usage percent figures, rounds these
figures to
two decimal places, and prints them as floating point
numbers with
"M" characters to designate them as megabytes.
AWKCMD holds open any file associated with a pipe to
getline.
After printing the partition's space usage figures,
dusage
explicitly closes that file; otherewise, AWKCMD will
run out
of file handles. AWKCMD may only have one pipe opened
at a
time. Because a similar pipe gets opened for each partition
on every
system, dusage closes each pipe when finished with it
so the
next one will open.
Finally, next skips all remaining patterns and actions
for
this initial input line. Without executing next, AWKCMD
would try to apply the remaining patterns and actions,
and could produce
very strange output.
All secondary lines coming from USECMD are handled by
AWKCMD's
second pattern. In the first column of input, quot delivers
the number of 1K blocks used by every user on the target
partition.
Is that number, divided by 1024 to translate kilobytes
into megabytes,
greater than the threshold? If so, this line identifies
a user holding
more space than the threshold. The script then formats
and prints
the usage amount in megabytes, along with the user's
name and the
usage percent.
Conclusion
dusage may point out surprising disk space usage figures.
It helps administrators and managers identify who needs
a nudge to
clean up files. It can also help them identify which
users typically
use the most space when they must reorganize file space
onto new servers.
About the Author
Larry Reznick has been programming professionally since
1978. He is currently
working on systems programming in UNIX, MS-DOS, and
OS/2.
He teaches C, C++, and UNIX language courses
at American River College and at the University of California,
Davis extension.
|