If you've ever booted your UNIX system and watched while
fsck destroyed
your filesystem, you've probably wondered just what
it is doing and why.
In this article I undertake to answer those questions
for you. I begin
with an in-depth look at how the filesystem is laid
out, then describe
the steps that fsck takes as it does its work. Finally,
I describe some
simple faults and show how fsck repairs the damage.
The filesystem
layout can vary with the version of Unix you use. See
the sidebar for
details. For the purposes of this article I will use
the UPS file
system.
The information displayed when you use the ls command
comes from several
places, including the directory structure and the inode
structure.
Figure 1 shows the ls -l output for a simple directory
structure with a
subdirectory, a file, a file linked to another file,
and a symbolic
link. To visualize this structure a little better, look
at Figure 2,
which shows the relationship in a diagrammatic form.
Directories are allocated in units called "chunks."
Chunks are sized so
that each allocation can be transferred to disk in a
single operation,
then are broken up into variable length directory entries
to allow
filenames to be of nearly arbitrary length. Figure 3
shows the directory
structure. The first three fields of each directory
entry are fixed
length and contain the size of the entry, the length
of the filename,
and the inode number. The final entry is the name. The
name is what the
user knows and works with; the inode number is what
UNIX knows and works
with.
More than one directory entry can point to the same
inode, which means
there can be multiple names for the same file. A good
example is /bin/vi
and /bin/ex: the program may act differently depending
on which name you
use. Figure 6 shows how the "." and ".."
notation work: they serve as a
shorthand reference to the directory itself (".")
and to the parent
directory (".."). A directory entry thus always
shows a minimum of two
links. This method of providing multiple names is called
"hard linking"
and is only valid within a filesystem.
Looking again at the entries in the simple directory
structure in Figure 1,
and by comparing them to the directory entries in
Figure 6 you can
see that File1 is the regular file; File2 is linked
to File3 via a hard
link created via the ln command; and File4 shows a symbolic
link. In
this case the file name is replaced with a path name.
To get to the
inode, a process must follow the link to another directory
entry, then
follow that directory entry to the inode. Because this
is a two-step
operation, it is possible for the link to be present
even after the file
to which it referred has been deleted.
Figure 4 shows the layout of an inode. The top section
of the structure
holds the information displayed by the ls commands.
This information
includes:
The number of physical blocks used by the file (including
blocks used
to hold indirect pointers).
Beneath this section is a series of 15 pointers to the
data blocks used
by the file. The first 12 pointers each point to a single
data block,
which allows small files to be accessed very quickly.
On a filesystem
with a 2Kb block size, for example, the first 24Kb can
be accessed
directly. Since most files are quite small, this method
provides very
fast access.
The next pointer points to a single data block that
contains pointers to
other data blocks (see Figure 5). This is referred to
as single indirect
addressing. If each pointer is 32 bits long, a block
can hold 64
pointers. Using single indirect pointers, you can address
64 * 2Kb +
24Kb, or 152Kb. The next pointer is the double indirect
pointer, which
points to a block of pointers, each of which points
to a block of
pointers, each of which in turn points to a data block.
Using double
indirect pointers one can address 64 * 64 * 2Kb + 24Kb,
or 8.02Mb.
The last pointer is the triple indirect pointer, which
points to a block
of pointers, each of which points to a block of pointers,
each of which
in turn points to a block of pointers which point to
a data block. This
gives a maximum filesystem size of 64 * 64 * 64 * 2Kb
+ 24Kb, or
512.02Mb. With a larger block size of, for example,
4Kb, you can have a
maximum filesystem size of 8.19Gb. The drawback to the
larger block
sizes is that more space is wasted.
Given larger block sizes, the system sets the size of
a "fragment,"
which is the minimum amount of space it will allocate.
The fragment size
is specified when the filesystem is created. Each filesystem
block can
be broken down into two, four, or eight fragments, each
of which is
addressable. Consider a data file of 9Kb in a filesystem
with a 4KBb
block size and a fragment size of 1Kb. The file will
occupy two full
data blocks and the first fragment of a third block.
The remaining three
fragments are available for use by other files.
If the file grows by another 1Kb, the system will check
to see if
fragment 2 is available; if it is, then the space will
be allocated. If
not, the system searches for a two-fragment space elsewhere.
If such a
space can be found, the first fragment is copied to
the new location and
the second fragment is allocated. The original fragment
is freed and
marked as available. If suitable space is available,
a new block is
allocated and the first two fragments are used for the
file. This
procedure helps minimize the fragmentation of a file.
What fsck Does
fsck is the command that checks the filesystem. By default
it performs
its checking in an interactive mode, telling the user
what
inconsistencies it finds and recommending ways to fix
them. fsck then
prompts the user for a decision about whether to repair
the problem.
fsck works through a number of phases.
Initialization
Sets up the internal tables it needs (blockmap, freemap,
etc.).
Opens any files it needs (/etc/fstab or /etc/checklist).
Phase 1 -- Check Blocks and Sizes
Checks the inodes (looks for valid inode types, correct
inode size, and
format).
Checks for bad or duplicate blocks (valid block numbers,
blocks claimed
by more than one inode).
If fsck finds problems with the data blocks, it then
runs:
Phase 1B -- Rescan for More Bad Dups
Rescans to locate the inode(s) which previously claimed
the duplicated
block(s) found during Phase 1.
Phase 2 -- Check Pathnames
Checks directory inodes of the filesystem looking for
inodes out of
range, directories with zero length, corrupted directories.
Removes directory entries pointing to error-conditioned
inodes.
Phase 3 -- Check Connectivity
Checks allocated inodes for un-referenced directories.
(Parent directory
doesn't exist or was removed by Phase 2.)
Places orphaned directories and files in lost+found.
Phase 4 -- Check Reference Counts
Checks for correct link count information.
Checks free inode information.
Adjusts the free inode and link count to actual values.
Phase 5 Check Cylinder Groups
Checks the free block maps.
Checks the summary information (free block, free inode,
frag counts held
in the superblock summary).
If problems are found in the cylinder groups, then the
following phase
is also run:
Phase 6 -- Salvage Cylinder Groups
Reconstruct the free block maps.
Clean-up
Produces a summary message showing the number of files,
the number of
fragments used, the percent of blocks used for fragments,
and the number
of free fragment-sized blocks.
Displays advisory messages warning of file system modification
and/or
directing the operator to reboot.
A Sample fsck Run
To see fsck in action, refer to the simple directory
in Figure 1. Assume
that a machine crashed while the system was updating
the disk image, and
that an error occurred while File1 was being moved to
sub_dir/File5. The
result is that File1 and sub_dir/File5 both claim the
same data block.
The directory sub_dir has also been corrupted. Running
fsck on the
filesystem as the system comes back up would generate
something like the
following sequence of events.
Phase 1 -- Check Blocks and Sizes
At this point fsck will find that sub_dir/File5 is claiming
a data block
which another file has already claimed. The inode will
be marked as bad
and the directory entry as well.
Phase 1B -- Rescan for More Bad Dups
The system is now looking for the other file that claimed
the data
block. In the example, this is File1. In real life the
system will try
to allocate the block to one file or the other. For
present purposes,
assume that it cannot make up its mind and will truncate
File1 at this
block.
File1 has been truncated and the remaining blocks marked
as free.
sub_dir/File5 is an invalid inode with no data blocks.
Phase 2 -- Check Pathnames
The directory sub_dir will be checked at this point.
The entry for
sub_dir/File5 will be removed, since the directory structure
for sub_dir
also has errors. The entire subdirectory will be removed
as well.
To recap, File1 has been truncated and sub_dir/File5
has been removed,
as has sub_dir. sub_dir/File1 still exists, but has
lost its parent
directory. The counts of inodes and free blocks are
no longer valid.
Phase 3 Check Connectivity
The file sub_dir/File1 will be moved to the lost+found
directory and
given the name of its inode number. Its name has been
lost because the
directory was corrupt and has been deleted.
Phase 4 -- Check Reference Counts
The fact that the inode and free block counts are wrong
will be detected
and corrected. The inodes used by the deleted files
will be added to the
count, as will the released data blocks. The process
is just about
complete.
Phase 5 -- Check Cylinder Groups
Each cylinder group is scanned in turn and the bitmaps
stored on the
disk compared with the information held by fsck. After
the changes made
by fsck, the bitmaps no longer match so phase 6 will
be triggered.
Phase 6 -- Salvage Cylinder Groups
Each cylinder is scanned in detail and the bitmaps are
rebuilt to
reflect what is currently on the disk.
When fsck is finished, File1 has been truncated, sub_dir
has been lost,
the file sub_dir/File1 has ended up in lost+found, and
the file being
written to sub_dir/File5 has been removed. A this point
you can reach
for the backup tape and, using the list of inodes you
made while you
watched fsck work, recover the affected files.
Conclusion
A basic understanding of the filesystem layout can help
a system
administrator recover from disk errors by using the
backup copies of the
superblock. It can also help administrators understand
what fsck is
doing and what has to be done to recover the files and
inodes it has
affected. n
About the Author
John Woodgate holds a BSc in Computer Science and has
worked in the
industry since 1977. He has worked as system administrator
on large
scale systems linked into international networks. He
is currently
working as a consultant and may be contacted at
john@meertech.demon.co.uk.