Setting up File Systems and Partitions
You can treat file system maintenance as drudgery, or
you can raise
it to a new level of skill by turning it into storage
Engineering requires making design tradeoffs based on
intended use, efficiency, reliability, and maintenance.
When you take
the OS vendor's defaults for file system partitioning,
your backup strategy to what you've been given, you're
When you allocate file systems to different disks to
activity, to mount binary-only file systems read-only,
and to integrate
partition size decisions into your backup strategies,
This article looks at the structure of UNIX file systems
perspective of tuning for intended use, then examines
the issues involved
in determining how many file systems to create, where
to put them,
and how to create them with the optimum sizes. The article
the repartitioning process, but the actual commands
must come from your own man pages. Finally, I present
you can apply to measure how well you've engineered
File System and Inode Structure
Historically, UNIX files have not had names, and UNIX
have not contained files. Instead, UNIX relies on the
The inode contains all system information about the
file, except its
name. A directory is merely a particular kind of file.
Like all files,
a directory starts with an inode, in this case, an inode
blocks contain a flat list of filenames and inode numbers
numbers" in POSIX terminology).
A file system has typically contained a boot block,
a super block,
a number of contiguous blocks filled with inodes, and
in what was left over. The first available inode of
the boot partition,
usually block 2, becomes the inode whose name is "/"
file system begins. The data blocks pointed to by the
list names and inodes for the other common directories:
usr, tmp, and so forth.
All partitions on each disk are laid out the same, except
boot block is only used on the partition the ROM says
has the bootstrap
The Fast File System
The Berkeley Fast File System (FFS) added some complexities
simple scheme, to achieve a 10X improvement in disk
The FFS, or some variant of it, is now used by nearly
all UNIX vendors.
One of the FFS optimizations relevant to storage engineering
grouping of adjacent physical blocks into larger virtual
that disk subsystems are most efficient when transferring
of data, the FFS forces the disk subsystem to transfer
8 (or 4, or
16, or more) physical blocks at a time -- whether or
all the physical blocks are actually asked for. Hence,
is no longer really a block; the disk subsystem might
transfer 8 rotationally-adjacent
blocks at a time, which makes much better use of the
than would transferring just the one block that was
This would be inefficient if disk accesses were truly
they're not. Consider how grep, awk, sed,
etc., behave: they read files sequentially, which means
they ask for
one block, process it, then in the next millisecond
ask for the next
block. With the FFS, the next physical block is already
A fast file system is thus composed of inodes and "blocks,"
in which a "block" contains 1, 2, or 4 "fragments."
A fragment is the smallest unit of allocation; it may
contain 1 or
more physical blocks.
Suppose you set up a file system to contain 8 physical
"block," and 4 fragments per "block".
Assuming a physical
block size of 512 bytes, a single "block"
could then contain
4 very small files (files with 1024 or fewer bytes),
with each file
taking up 1 fragment. Or, the 8 adjacent physical blocks
4K adjacent bytes within one file, making for an efficient
An inefficiency occurs when a small, fragment-sized
file grows. If
the "block" is full, the existing data has
to be copied to
a "block" with 2 or more free fragments. In
inefficiency is small compared to the enormous increase
gained by transferring more physical blocks at a time.
It is common practice to mount "/", /usr,
/home from different partitions. In fact, it's also
just to mount /dev/sd0c or /dev/dsk/c0t0d0s2 on "/"
-- UNIX can be run with just one physical disk partition
whole file system.
At the opposite extreme, every directory is a potential
and no names are magic. Root becomes / only because
it is the root
inode of the boot partition. Instead of mounting /usr
/home from separate partitions, you could mount /usr/lib
and /home/george/news in separate partitions, with all
files living in the root partition.
The description of the FFS above suggests one reason
other directories than /: if all directories are mounted
on one file
system, you've lost the ability to tune the file system
to the way
the file system is used. Mounting multiple directories
has other advantages
directories that change infrequently can be backed up
less frequently; if you use a file-system-oriented backup
like dump or ufsdump, you back up by file system,
not by directory
file systems with critical files can be mounted read-only
to prevent accidental or intentional corruption
if local software and application software are kept
on a separate file system from the operating system
the local software can be unmounted during an upgrade
afterwards, eliminating the need for restoration from
smaller "chunks" of directories can be moved
to new file sytems when new disks are added to a system
inodes per file system can be reduced when you create
a partition to be used for large files
You can go beyond the traditional mount points (root,
swap, and home). Some good candidates are
-- /local (or /usr/local) for local software;
-- /tmp, so that accounting doesn't get turned off when
someone does a large compile;
-- spooling space (/usr/spool, /var/spool), for
the same reason;
-- third-party application installation directories;
-- an anonymous ftp directory, so that an unwelcome
gift of files
in /pub/incoming doesn't prevent your users from doing
-- /export/swap, swap space for diskless workstations,
which needs only one inode per swap file (plus one for
rather than the thousands of inodes that are the default.
You can make tradeoffs among 4 characteristics: space
transfer speed, ease of maintenance, and security. Figure 1
these tradeoffs. To optimize for data transfer speed,
set up a file
system with large blocks and few fragments. To optimize
efficiency (that is, minimum wasted space within a partition),
smaller blocks and more fragments. To use read-only
mounts to protect
critical files (like the executables the OS vendor supplies
to increase flexibility in your backup schedule, and
to setup firewalls
against errant processes, choose fewer, smaller file
the sidebar, "Many versus Few Partitions,"
for a more detailed
The benefits of tailoring the size of a partition to
its use are clear.
The tough part is deciding what should go where, and
how much space
to leave for each chunk. Here are some rules of thumb:
Put /tmp and /var (or /usr/spool) into their own
partitions. If your OS puts /tmp into swap space (e.g.,
Solaris 2.x), you can leave it there or put it in a
partition. If /tmp is mounted on the swap partition,
swap space as /tmp grows, and you lose temporary space
swap usage grows. If you run large programs and need
of temp space, a separate partition for /tmp may be
Keep OS vendor binaries in /usr in their own partition,
separate from all local binaries and third-party applications.
The vendor binaries don't grow (at least, not till the
next OS upgrade),
so you can size the partition very close to the actual
use. You can
enhance security by mounting /usr read-only; not even
can write into a read-only mount. Read-only mounts are
a lot of trouble
when you have to make changes -- that's why only vendor
belong in a read-only /usr.
Consider every other chunk of 50M bytes or more as
a candidate for a separate partition, where a "chunk"
the du -s output for a directory and all its subdirectories.
Put related packages into the same partition, making
up your own relations
as you see fit.
Use a deep -- rather than a broad -- directory
structure when you install a package. Only directories
points, and every directory is a potential mount point,
so you maximize
your mounting options when you arrange files in more
You should expect to repartition your disks at least
file systems are dynamic, and disk needs change as patterns
change. With the price of disk storage falling monthly,
probably installing new disks more frequently than you
may have done
five years ago. Unless you buy a separate disk for each
install, you'll need to move data from one partition
to another when
you install a new disk, just to make good use of the
You might consider repartitioning at your next OS upgrade.
can wait that long, this is an excellent time.
How to Repartition
As a first step, lay out the new partition sizes on
paper (you might
use graph paper, letting 1 block stand for 1M byte,
or 10M, or 20M.
Sketch in proportional sizes for each disk and each
partition to help
you visualize your space needs. If you use ink for the
and pencil for the partitions, you can juggle size needs
to disk capacities
with a minimum of redrawing.
Except for the root and /usr partitions, here's how
1. Go to single-user mode.
2. Backup all affected partitions (level 0 dump).
3. Repartition according to OS instructions. The program
is usually called format, and it requires you to fill
starting and ending cylinders (or starting cylinder
and length) of
each disk partition.
4. Reboot single-user.
5. Make new file systems on all partitions affected
by the changes in step 3. (See newfs or mkfs man pages,
and tuning comments below.)
6. Mount the new file systems. Edit /etc/vfstab
(/etc/fstab) to make the new mount points permanent.
7. Restore from backups. The kernel automatically routes
the files to the correct partitions.
8. Backup again (level 0). If you don't backup level
0 after a restore, restores from later incrementals
9. Go to multi-user mode.
Though tedious and time-consuming, these steps are not
The only serious error you can make is to have two partitions
but judicious use of a calculator will prevent that.
Resizing Root and /usr
Many systems won't boot, even to single-user, without
both root and
/usr partitions, so moving or changing the size of root
/usr is more difficult. Resizing root also means reinstalling
the boot block after you've made a new file system.
To change the size of root, you must boot from a different
On some systems, you can use the OS distribution medium
tape) for this step. For other systems, you'll have
to duplicate the
root partition on another disk, make that partition
installboot or a similar program), reboot from the new
then backup-repartition-restore your first root.
For most systems, you must also run installboot after
a new file system on root. Run installboot after restoring
the files but before making a new level 0 backup.
The steps required to move or enlarge /usr are similar.
you can boot from the distribution medium, do so. Otherwise,
/usr on another partition, edit /etc/vfstab, reboot
single-user with the duplicate of /usr, backup-repartition-restore
your first /usr, re-edit vfstab, and reboot.
If you've never moved root or /usr before, you might
to practice on a spare system first. If you're not blessed
spare system, take it step-by-step and have hardcopy
of the man pages
next to you before you start.
Block and Fragment Tuning
With an understanding of file system structure and a
of how your file systems will be used, you can begin
to see opportunities
for tuning. The easiest time to do this is when you
make the file
As an example, imagine a user with hundreds of 2Mb data
can store the data in a separate file system set up
to have, say,
16 physical blocks per "block", and maybe
only 1 or 2 fragments
per "block." With a "block" of 16
and 2 fragments, the minimum allocation is 8 blocks
-- 4K bytes.
That would be wasteful for a user with a 17-byte file
(4096 - 17 =
4079 wasted bytes), but it means very efficient disk
most of the files in that file system are 2M bytes or
So there's one obvious file system tuning rule: file
large files benefit from large "blocks" and
fragments. Databases are an obvious candidate for directories
on file systems with larger blocks.
The quantity of inodes allocated to a file system can
also be adjusted
when the file system is created. For a given file system
500M bytes, a file system with 1000 files will need
fewer inodes than
a file system with 5000 files, because you need one
inode for each
file you create, plus one for each directory, plus one
more for the
Inode blocks do not store data, just kernel information
file -- they're file system overhead. In the first case
paragraph above, you need one inode per 500K bytes of
data; in the
second, one inode per 100 bytes of data. If you can
fit four inodes
into one physical block (128 bytes per inode -- see
for the exact figure), then you will need 1000 / 4 =
full of inodes (plus spares for directories) in the
first case, but
1250 blocks full of inodes (plus spares) in the second.
extra 500K in inodes in a 500M partition -- 10 percent,
or 5M bytes,
in file system overhead.
In practice, the differences aren't quite so dramatic.
See for yourself
on your own files systems: compare df -k to df -e
(or df to df -i on BSD systems) to see how many inodes
are unused on your file systems. Systems with BSD parentage
provide a newfs command to invokes mkfs with parameters
to specify how many inodes to allocate per k-bytes of
Measures of Quality
The quality of system administration is tough to measure.
approach is to imagine what would happen if you did
a perfect job.
So, if you could set up your file systems absolutely
how they might look:
Each file system would have exactly the right number
of inodes, that is, there would be no inodes left if
the file system
were to reach 110 percent of capacity.
File systems containing mostly large files would have
large blocks and few fragments.
A file accidentally deleted by a user could be instantly
retrieved from backup tapes.
Disk activity would be equally divided among all disks.
Performance monitors would show all disks with the same
rate of data
Static file systems would be 100% full; see Table 1
for some other targets.
By applying your knowledge of the basic file system
structure to the
needs of your users and to your own needs for system
can engineer the use of your disks to trade off speed
for more space,
or security for simplicity. The tools to do so are already
You need only apply your own good judgment, then find
the time to
do the job.
About the Author
John Caywood received B.S. and M.S. degrees in computer
Old Dominion University, Norfolk, Va., and he taught
there for three years. He is currently employed by InfiNet,
Internet access provider in Norfolk, VA. He can be reached