Finding Disk Hogs
"Who's hogging my disk space?" Ignoring the
conceit of ownership
that we all share, that's what a group manager wanted
to know. He'd
written a script to tell him who was using disk space
on one directory
tree important to his operations. That was a good idea,
but I wanted
something more thorough. I wanted to know about the
space usage throughout
every disk drive. Furthermore, I wanted one script to
find this information
on both the HP-UX and the SunOS systems at the site.
dusage (Listing 1) collects disk usage information across
every partition on a set of systems. The script will
check a set of
systems specified on the command line or will check
a default set
of systems built into the script. You may give a megabyte
to reduce or increase dusage's output. Only users whose
usage exceeds the threshold are shown in the report.
You may specify
the threshold with or without specifying which systems
to check, but
to specify systems you must give a usage threshold.
Disk Usage Report
dusage's report shows the amount of space used by each
filesystem and the amount of space used by each user.
Figure 1 shows
a full report of the script's default systems, using
the 50 megabyte
default threshold. Figure 2 shows a report for only
one system using
a 20 megabyte threshold. The smaller the threshold,
the more users
who show on the report. dusage accepts a threshold as
as 1 megabyte, which shows nearly every user storing
files on the
dusage's report is designed not only for easy reading
and managers, but also for easy parsing when feeding
its output into
other scripts for further analysis.
As a heading, the first line identifies the megabyte
for the report. Two lines are skipped between each system's
and a single line is skipped between each disk drive's
disk's report data is single-spaced.
System subheadings are preceded by four equals signs
(====) to simplify
extracting the report for one system from a larger report
many systems. Assume that you've saved a large dusage
to a file by redirecting the script's output. You can
extract a single-system's
report by running
egrep -n ==== reportfile
This command shows a list of every system named in the
file and the line number where each system's subreport
that the report you want begins on line 53 and the next
begins on line 69. Use the command
tail +53 reportfile | head -16
to extract the 16-line subreport. If the subreport is
the last one in the report file, you don't need to pipe
program's output to the head program.
Each system subreport shows all the disk partitions
for that system.
dusage shows the names of the partitions' mount points
remote system notation:
Beneath the partition's designation is a line containing
space usage figures for the whole partition. These figures
the df program, but they're scaled to use megabytes
of df's usual kilobytes output. Total space usage lines
surrounded by parentheses to make extraction of these
for scripts that want to analyze dusage's output.
The usage percentage is calculated the same way df calculates
it, but shows the value rounded to two decimal places.
rounds to the nearest integer. The total shown ("Tot:")
typically greater than the sum of the "Used:"
space because "Tot:" reflects certain space
used as overhead
by the system. The "Usage:" percentage reflects
space available from the "Used:" and "Avail:"
not the potential space as shown in the "Tot:"
User usage lines follow the space usage line. These
lines show each
user whose usage exceeds the threshold. User lines show
in order of
most space used to least space used. Three tab-separated
the total megabytes used, the name of the user holding
and the percentage of total space used by the user.
UNIX allocates disk space 1K (two 512-byte blocks) at
a time. Total
megabytes used represents the total allocated space
in 1K allocation
units held by a user, not merely the sum of a user's
Usage percentages come from the relationship of the
usage to the partition's available space, which is the
sum of the
partition's "Used:" and "Avail:"
Disk Usage Script
The heart of dusage is the quot program. (Refer to
quot(8) on SunOS, and quot(1M) on HP-UX and on other
versions of System V). The -a option causes quot to
examine every mounted filesystem on a system and report
who is using
space and how much. quot is perfect for dusage's needs,
because quot's report gives the total 1K allocation
for each user in order of most used to least used.
Unfortunately, quot only gives this information when
a root user. Therefore, dusage requires the user to
effective UID of zero. If you need dusage's report frequently,
run dusage in root's crontab. Because dusage
sends its output to the standard output, cron automatically
mails the output to the crontab owner unless you redirect
that output to a file. If you want many people to read
be sure to set appropriate owner and group IDs and permissions
part of the cron job.
Listing 1 shows dusage, a Bourne shell script. dusage
has been tested to work correctly on HP-UX and SunOS
systems. It may
work unchanged on other versions of UNIX, but I've only
on those two systems.
SYSTEMS is set at the beginning of the script to make
variable easy to find and modify. The SYSTEMS variable
the default systems that dusage reports about when system
lists are not included in its command line. Listing
1's settings are
dummies. Modify your SYSTEMS variable to hold the system
you're concerned with most frequently.
Because root privileges are required to run this script,
variable is restricted to /bin, /usr/bin, and /etc.
These directories hold most of the programs dusage needs.
Other programs have their paths fully spelled out when
PROGNAME is set to the base name used to invoke the
so that error messages refer to the correct invocation
script actions cause dusage to lose access to $0,
so I reserve its value now.
The next phase of the script introduces a method for
making this script
as portable across UNIX systems as possible. Because
of my initial
development requirements, this script runs on HP-UX
and SunOS. HP-UX
is System V-based and SunOS is BSD-based. There are
between them to drive a portable script developer nuts.
has to deal not only with program differences on the
but also with differences on the remote systems from
which usage figures
are collected. The executing system may be different
from the remote,
examined system. Designing the script so that it could
figures even though run on SunOS, or vice versa, was
Throughout discussion of the rest of this script, keep
in mind that
the executing system may run a different version of
UNIX than the
WHICHSYS initially holds the operating system name on
executing system, although later it will hold the name
of the target
system. The next test figures out which UNIX system
is executing by
looking at the first five characters. dusage assumes
if those five characters aren't "HP-UX," they're
Only an if...else condition is needed for this test.
find that simple assumption won't work for your needs,
to a case statement. Make the case execute expr to
extract the five characters, just as dusage does. Look
HP-UX and SunOS using the usual case pattern matching,
and then add
patterns to match whatever else your UNIX systems require.
At this point, the script sets the remote shell command
the awk command (AWKCMD), and the echo command
(ECHOCMD) variables. HP-UX uses remsh, but SunOS uses
rsh. I suspect HP-UX names the command differently to
conflict with the restricted shell's name. Nevertheless, System
uses rsh(1), so I view HP-UX as being different here.
remsh is in dusage's PATH. SunOS presents
a problem by keeping its version of rsh in /usr/ucb,
which isn't in dusage's PATH.
dusage uses awk for its data collection, calculating,
and formatting. Some UNIXs keep the obsolete version
hanging around with the new, more thorough version of
SunOS does this. Most System V versions of UNIX I've
worked with do
this also, naming the older version awk and the newer
nawk. On systems I administer, I prefer to rename the
version of awk to oawk and then make a hard link from
awk to nawk. This keeps both versions available but
forces the more powerful, new version as the default
taking much extra system space. However, because I wrote
to work despite someone having set that link, AWKCMD
to awk for HP-UX and nawk for SunOS, those systems'
respective versions of the more complete awk. Use additional
case pattern matches to direct your AWKCMD another way.
ECHOCMD points to the System V version of echo. HP-UX
automatically uses that version, but SunOS must go to
to get that version. System V's echo supports many embedded
control codes, using a C-like escape notation with a
echo has no such support. dusage needs embedded newlines
to make its messages easier to read.
With these three primary support programs selected,
can move on to examine its command line. If ECHOCMD
set correctly, not even the usage message will output
it uses an embedded newline (\n).
THRESHOLD comes from the first argument on the command
However, the user doesn't have to give that argument.
THRESHOLD is set to 50. If you prefer some other default
set the number of megabytes after the colon-hyphen (:-)
That threshold must be a positive number. Any negative
number or zero
is assumed to be an error and shows the usage message.
This way, if
the user starts dusage with a negative number threshold,
as "-50," or with a non-numeric argument,
such as "-h"
or "help," dusage tells the user how to use
it. Figure 3
shows an example of this help message.
For the rest of the script, the user must be a superuser.
-u" to see if the effective UID is 0 prevents the
going further. Figure 1 shows an example of this message.
and error messages are redirected to the standard error
differentiate them from the report's normal output.
However, if a
cron job redirects dusage's standard output to a file,
cron still mails the standard error to the root user.
The next test checks whether systems were named on the
Because systems cannot be named without identifying
a threshold, the
test expects at least two arguments on the command line
for a system
name to be present. If no such names are on the command
default SYSTEMS value is set into the positional parameters.
When the command line contains system names, the threshold
out of the positional parameters. Whichever way this
test goes, when
it is finished, the positional parameters contain only
Keying to the Remote System
Report generation begins by showing the current run's
each system named in the positional parameters, dusage
adjust itself to use certain programs on the remote
is reused to hold the remote system's OS name. Appropriate
and type commands are selected, according to whether
system is HP-UX or SunOS. On SunOS, df outputs a report
disk free space in 1K units. HP-UX's df shows the same
in 512-byte blocks, but doesn't support System V's -k
to show it in 1K units. Instead, HP-UX has a different
to show the same format as SunOS's df and System V's
-k." Use a case statement to handle other operating
Discovering the proper path to certain commands became
than the simple if...else used so far. On HP-UX and
V installations, root users start up a remote shell
using either Bourne
shell (sh) or Korn shell (ksh). On SunOS, root users
start up C-shell (csh). Even though dusage runs under
Bourne shell because of the #!/bin/sh line at the top
script, when a remote shell (remsh or rsh) starts,
the default shell on the remote system is not necessarily
shell. That means the command to find the full pathname
of a program
differs according to the target system: /usr/ucb/which
SunOS's C-shell, and type on Bourne shell or Korn shell.
This different shell problem gets hairier on HP-UX.
According to the
remsh documentation, remsh's initial PATH
excludes the /etc directory, which some dusage commands
need. So, the TYPECMD used on HP-UX must contain the
PATH setting used by dusage, including /etc.
TYPECMD throws one more gotcha: type's output style
is different from which's style. Fortunately, the TYPECMD's
output format always shows the pathname last. Pipe the
output to the AWKCMD to extract the last field's value,
matter how many fields there are. This delivers the
DFCMD is set to such a full pathname for the remote
command, be it bdf or df. USECMD is similarly
set to the usage command, quot. An extra quoted string,
-a" with a space before the hyphen in the quotation
the output of the shell substitution so that quot will
results for all mounted filesystems.
Collecting Remote System Statistics
Once all the target system configuration work is done,
can begin collecting disk usage figures for the remote
Skipping two lines, dusage announces the system name
subreport. It then remotely executes the appropriate
and pipes USECMD's output into a local copy of AWKCMD.
AWKCMD collects and formats the figures.
Two kinds of output lines come from USECMD (quot):
an initial line identifying the current disk partition,
and a set
of secondary lines identifying the users taking space
on that partition.
Relatively little work goes into handling the secondary
more work goes into developing the partition headings
for the report
from the initial lines.
Initial lines from quot identifying the current disk
partition always refer to the physical device pathname.
dev directory is always part of that pathname, despite
subdirectory classifications, seeing if the line contains
seemed most portable. So, the AWKCMD pattern
$0 ~ /dev/
looks for those three letters -- slashes surround
regular expressions in awk patterns -- and executes
action associated with a partition's initial report
quot delivers its partition identifier showing the device
pathname and then, in parentheses, the mount point pathname.
prefer to see both, use $0 in printf()'s second parameter.
However, assuming that most managers analyzing the report
care about the partition's device name, I extracted
the mount point
pathname from $0. To extract the mount point, set AWKCMD's
match() function to find all the characters within parentheses.
Those parentheses needed to be double-escaped -- using
not just "\(" -- because HP-UX's awk and SunOS's
nawk use differing regular expression rules. Doubling
escape character eliminated the difference.
AWKCMD's match() function loads the global variables
RSTART and RLENGTH. These hold the relative starting
position and the relative length of the matched string.
the AWKCMD's substr() function, those global variables
deliver the mount point pathname contained in quot's
line. Because the regular expression includes the parentheses
match, I add one to RSTART to skip the open parenthesis
subtract two from RLENGTH to ignore both the open and
parentheses. That extracted name is formatted into dusage's
showing remote system notation. The report skips a line
after this output.
Finding the remote system's total space, as reported
requires separate remote shell execution. Recall that
the remote shell
invocation of USECMD has already delivered its output
AWKCMD. Another RSHCMD won't interfere with that output.
Retrieving this new RSHCMD's output from the DFCMD
from inside the AWKCMD is a little tricky. It not only
the RSHCMD and the DFCMD, but also requires DFCMD's
output to pipe through egrep. egrep extracts the only
relevant line from DFCMD's output. To make AWKCMD
execute RSHCMD and retrieve the output, I pipe the command
AWKCMD's sprintf() function creates a string from
the several variables that form the remote shell DFCMD
the egrep pipe. This string is formatted and assigned
sysspace variable. Expanding the string in front of
to getline runs the command and delivers the output
Because of egrep the command only outputs one line.
assigns each word from the output line to AWKCMD's positional
parameters. Doing this replaces USECMD's input line,
program is finished with that. Total, used, and available
come from USECMD's output. dusage converts them from
DFCMD's kilobytes format into megabytes, calculates
space value for the usage percent figures, rounds these
two decimal places, and prints them as floating point
"M" characters to designate them as megabytes.
AWKCMD holds open any file associated with a pipe to
After printing the partition's space usage figures,
explicitly closes that file; otherewise, AWKCMD will
of file handles. AWKCMD may only have one pipe opened
time. Because a similar pipe gets opened for each partition
system, dusage closes each pipe when finished with it
next one will open.
Finally, next skips all remaining patterns and actions
this initial input line. Without executing next, AWKCMD
would try to apply the remaining patterns and actions,
and could produce
very strange output.
All secondary lines coming from USECMD are handled by
second pattern. In the first column of input, quot delivers
the number of 1K blocks used by every user on the target
Is that number, divided by 1024 to translate kilobytes
greater than the threshold? If so, this line identifies
a user holding
more space than the threshold. The script then formats
the usage amount in megabytes, along with the user's
name and the
dusage may point out surprising disk space usage figures.
It helps administrators and managers identify who needs
a nudge to
clean up files. It can also help them identify which
use the most space when they must reorganize file space
onto new servers.
About the Author
Larry Reznick has been programming professionally since
1978. He is currently
working on systems programming in UNIX, MS-DOS, and
He teaches C, C++, and UNIX language courses
at American River College and at the University of California,