When users think of a filesystem, they think of the
classic tree structure
composed of subdirectories and files. Everything begins
at the root
directory (/) and branches from there. Root is divided
into subdirectories,
which are further divided into additional subdirectories,
and so on.
Along the way are files within the subdirectories that
are stopping
points on the branch, much like leaves on a literal
tree.
UNIX was designed with ease of use in mind. This tree
representation
seemed the most efficient way to provide location and
access to files
and entities. It is no accident that users equate this
with the filesystem:
for most practical purposes, it is the filesystem. Administrators,
however, need to know that this tree is only the way
things look and
not the way they really are. They need to know that,
in reality, most
of the time there is a flatfile-style database that
keeps pointers
to the data, and that the pointers and data comprise
the real filesystem.
This article examines filesystems from an administrative
view and
explains what they are. In addition, it lists the tools
used to maintain
the filesystem and describes the most common types of
filesystems.
Simply, a filesystem is a collection of data. Every
existing UNIX
machine has a minimum of one filesystem on a hard drive.
That filesystem
is the root system, which boots when the machine is
turned
on. It contains the UNIX kernel (sometimes named /unix).
That
filesystem remains active for the entire time the machine
is up. (For
an overview of filesystems used by other operating systems,
see the
sidebar "How Do Other Operating Systems Treat the
Filesystem?")
If there are additional filesystems on that machine,
they may be mounted
and unmounted at will. For example, once the machine
is up, the root
user can mount(1M) another hard drive partition that
has database
files on it and can add new entries to a mailing list
database. After
adding the entries, the root user can unmount the partition
and mount
another with payroll information on it. Key points to
remember are:
Data blocks -- which store the data.
For every entry in a directory listing, there is a corresponding
inode(4)
holding information about the file. Inodes are pointers
to the data
blocks holding the files. Inodes have the following
components:
1. A unique inode number that increments by one with
every file or subdirectory created. Historically, inode
numbers zero
and one are set aside for special purposes. The root
inode
begins with two.
2. A two-digit number for the type of entry:
01 -- a named pipe.
02 -- a character device file.
04 -- a directory.
06 -- a block device file.
10 -- an ordinary file.
12 -- a symbolic link.
14 -- a socket.
3. The permissions on the entity. Permissions are
four digits. The first digit tells whether a special
mode is set (1=sticky
bit, 2=SGID, 4=SUID). The remaining three digits tell
read, write,
and execute permissions.
4. The physical size of the file.
5. The number of links to the entry.
6. The owner of the file, often in uid numeric form
-- the same information returned by the id command).
7. The group possessing the file -- again, often
in numeric form.
8. The timestamps for creation or change of the inode,
for last modification of the data, and for last access
of the data.
These are used in directory listings and by other commands,
such as
when find looks for files meeting certain time criteria.
9. The physical data block addresses where the file
resides.
These nine items constitute the inode, although their
order varies
from one UNIX vendor to another. To see your system's
order, look
for the file /usr/include/sys/inode.h, or for a file
in /usr/include/sys/fs/
with "ino" in its name. (See the sidebar "NFS
and RFS"
for a discussion of network filesystems.)
Hard-linked files share the same inode number. Hard
links cannot cross
filesystems because each filesystem has its own inode
list. All inode
numbering starts at two and increments for the unique
inode number.
The same number can appear on two different filesystems
and point
to two different files.
Symbolic links, on the other hand, are nothing more
than concrete
pointers from one place to another. Symlinks have unique
inode numbers.
As pointers to other destinations they can cross filesystems.
A symbolic
link on the root filesystem can point to a particular
file
on the database filesystem even when the database filesystem
is unmounted.
(For more information on inodes and the utilities used
to maintain
them, see my article, "When Inodes Go Bad,"
in the July/August
1994 issue of Sys Admin.)
Different Types
In the traditional UNIX world, there are two types of
filesystems.
Which one you used once depended on whether you had
chosen AT&T's
UNIX flavor, or had opted for the BSD versions. AT&T's
is known as
S5, for System 5, reflecting the way versions are differentiated.
Berkeley's filesystem was called the Fast File System
(FFS) but is
now known as UFS. Today, both are usually available
on a new installation.
Knowing which to use can help you tailor your system
to meet your
site's specific needs. (See the sidebar "Other
Filesystems"
for a brief discussion of other types of filesystems.)
Virtually all UNIX filesystems today use the concept
of Read-Ahead,
Write-Behind to increase their efficiency. With Read-Ahead,
each time
you access the disk, more blocks are read into the buffers
than just
the one you are requesting. Most of the time one request
is followed
by another. Often the next requested data is sequentially
next to
the currently requested data. It makes sense to read
in more data
than that requested. When the next request is made,
that data is already
in memory, so there is no disk access time involved.
Write-Behind
entails maintaining changes within the buffers and not
on the disk.
When activity slows down or the buffers fill, the buffers
write to
the disk and prepare for the next operation. Given that
the slowest
component inside any computer is the hard drive, you
can see how these
operations combine to increase processing efficiency
over operating
systems that don't use Read-Ahead and Write-Behind.
S5
Figure 1 represents the physical structure of an S5
filesystem, which
is also referred to by SCO as S51K. There is but one
boot block, one
superblock, and one inode list for the entire filesystem
-- everything
else becomes data blocks. The boot block (logical block
0) contains
primary and secondary bootstrapping programs. If you
create a nonbootable
filesystem, the space is still set aside for the boot
block, but that
space is not used or accessible. While size can differ
per vendor,
usually sectors 0 to 15 constitute the boot block, and
sector 16 is
the beginning of the superblock. The data blocks can
be either 512,
1024, or 2,048 bytes in size. Systems with many small
files can save
hard drive space by using blocks of 512 bytes, while
systems with
fewer files, larger in size, should have the block size
set to the
highest number possible.
UFS/FFS
Figure 2 shows a Berkeley Fast File System. UFS/FFS
uses the same
components as S5, but the components are not limited
to appearing
only once on the disk. Instead, the superblock and inode
list are
broken into smaller components and sprinkled throughout
the data blocks.
FFS was the default filesystem shipped with SVR4, and
the name has
now been changed to UFS for UNIX File System. Whereas
S5 allows for
structured blocks less than or equal to 2,048 bytes
in size, UFS supports
8,192-byte blocks. Because data is moved in blocks and
Read-Ahead
involves reading in additional blocks, the larger the
block, the more
efficient the operation. A side feature of UFS/FFS is
that it supports
255-byte names, as opposed to the 14-byte limit S5 imposes.
Its structure makes UFS much quicker and more efficient
at operations
than S5. Some estimates range as high as ten times quicker
in disk
throughput.
Helpful Tools
The most crucial of the filesystem tools is mkfs(1M).
This
utility creates the filesystem. fsck(1M) performs cleanup
operations, should data or pointers get out of sync.
Other utilities
that are available depend on the UNIX vendor. These
utilities can
include format or fmthard, both of which are used
to format the hard disk; dumpfs, to obtain information
about
the existing filesystem; and newfs, to rebuild a filesystem
without starting from scratch.
Selecting Which Filesystem to Use
Determining which filesystem is right for your installation
can be
daunting. The answer is not always black and white,
nor may it be
universal for your entire system or network. You can
have multiple
partitions on your system, each configured with a different
filesystem.
The first question to decide is whether you need many
or any multiple
partitions. There are a number of advantages to having
a small number
of large partitions:
Smaller partitions are easier to defragment. Fragmentation
occurs due to constant growth of some files and removal
of others.
When it occurs, the only solution is to make a complete
backup of
the filesystem, delete all the original files, and then
restore them.
Restoring to an empty filesystem writes the files without
fragmentation.
Once you have decided on the number of partitions and
the purpose
each is to serve, the next step is to decide which filesystem
is most
appropriate for that partition. Mixing and matching
is not only possible
but recommended.
Think about what the partition will be used for, then
ascertain which
is most appropriate for that application. With the root
(boot)
partition, S5 works well, because the boot block will
be used, and
because the files within that partition do not change
or grow regularly.
With a partition containing database files, UFS makes
more sense.
For such a partition, read and write access must be
as quick as possible.
Moreover, multiple superblocks provide a degree of protection
against
corruption. Should corruption occur, it will affect
only a percentage
of the filesystem's superblock rather than the whole.
References
Goodheart, Benny, and James Cox. The Magic Garden
Explained. Englewood Cliffs, N.J.: Prentice-Hall, 1994.
Sheldon, Tom. LAN Times Encyclopedia of Networking.
Berkeley, CA: Osborne McGraw-Hill, 1994.
About the Author
Emmett Dulaney is a product developer for New Riders
Publishing, and an
associate professor of Continuing Education at Indiana-Purdue
University
of Fort Wayne. He can be reached on CompuServe at 74507,3713.