Article

Using fsck to Repair a Filesystem

John Woodgate

If you've ever booted your UNIX system and watched while fsck destroyed your filesystem, you've probably wondered just what it is doing and why. In this article I undertake to answer those questions for you. I begin with an in-depth look at how the filesystem is laid out, then describe the steps that fsck takes as it does its work. Finally, I describe some simple faults and show how fsck repairs the damage. The filesystem layout can vary with the version of Unix you use. See the sidebar for details. For the purposes of this article I will use the UPS file system.

Directory Structures

The information displayed when you use the ls command comes from several places, including the directory structure and the inode structure. Figure 1 shows the ls -l output for a simple directory structure with a subdirectory, a file, a file linked to another file, and a symbolic link. To visualize this structure a little better, look at Figure 2, which shows the relationship in a diagrammatic form.

Directories are allocated in units called "chunks." Chunks are sized so that each allocation can be transferred to disk in a single operation, then are broken up into variable length directory entries to allow filenames to be of nearly arbitrary length. Figure 3 shows the directory structure. The first three fields of each directory entry are fixed length and contain the size of the entry, the length of the filename, and the inode number. The final entry is the name. The name is what the user knows and works with; the inode number is what UNIX knows and works with.

More than one directory entry can point to the same inode, which means there can be multiple names for the same file. A good example is /bin/vi and /bin/ex: the program may act differently depending on which name you use. Figure 6 shows how the "." and ".." notation work: they serve as a shorthand reference to the directory itself (".") and to the parent directory (".."). A directory entry thus always shows a minimum of two links. This method of providing multiple names is called "hard linking" and is only valid within a filesystem.

Looking again at the entries in the simple directory structure in Figure 1, and by comparing them to the directory entries in Figure 6 you can see that File1 is the regular file; File2 is linked to File3 via a hard link created via the ln command; and File4 shows a symbolic link. In this case the file name is replaced with a path name. To get to the inode, a process must follow the link to another directory entry, then follow that directory entry to the inode. Because this is a two-step operation, it is possible for the link to be present even after the file to which it referred has been deleted.

Inode Structure

Figure 4 shows the layout of an inode. The top section of the structure holds the information displayed by the ls commands. This information includes:

The type and access mode for the file.

The file's owner.

The group access identifier.

The number of references to the file.

The time the file was last read from and written to.

The time the inode was last updated by the system.

The size of the file in bytes.

The number of physical blocks used by the file (including blocks used to hold indirect pointers).

Beneath this section is a series of 15 pointers to the data blocks used by the file. The first 12 pointers each point to a single data block, which allows small files to be accessed very quickly. On a filesystem with a 2Kb block size, for example, the first 24Kb can be accessed directly. Since most files are quite small, this method provides very fast access.

The next pointer points to a single data block that contains pointers to other data blocks (see Figure 5). This is referred to as single indirect addressing. If each pointer is 32 bits long, a block can hold 64 pointers. Using single indirect pointers, you can address 64 * 2Kb + 24Kb, or 152Kb. The next pointer is the double indirect pointer, which points to a block of pointers, each of which points to a block of pointers, each of which in turn points to a data block. Using double indirect pointers one can address 64 * 64 * 2Kb + 24Kb, or 8.02Mb.

The last pointer is the triple indirect pointer, which points to a block of pointers, each of which points to a block of pointers, each of which in turn points to a block of pointers which point to a data block. This gives a maximum filesystem size of 64 * 64 * 64 * 2Kb + 24Kb, or 512.02Mb. With a larger block size of, for example, 4Kb, you can have a maximum filesystem size of 8.19Gb. The drawback to the larger block sizes is that more space is wasted.

Given larger block sizes, the system sets the size of a "fragment," which is the minimum amount of space it will allocate. The fragment size is specified when the filesystem is created. Each filesystem block can be broken down into two, four, or eight fragments, each of which is addressable. Consider a data file of 9Kb in a filesystem with a 4KBb block size and a fragment size of 1Kb. The file will occupy two full data blocks and the first fragment of a third block. The remaining three fragments are available for use by other files.

If the file grows by another 1Kb, the system will check to see if fragment 2 is available; if it is, then the space will be allocated. If not, the system searches for a two-fragment space elsewhere. If such a space can be found, the first fragment is copied to the new location and the second fragment is allocated. The original fragment is freed and marked as available. If suitable space is available, a new block is allocated and the first two fragments are used for the file. This procedure helps minimize the fragmentation of a file.

What fsck Does

fsck is the command that checks the filesystem. By default it performs its checking in an interactive mode, telling the user what inconsistencies it finds and recommending ways to fix them. fsck then prompts the user for a decision about whether to repair the problem. fsck works through a number of phases.

Initialization

Sets up the internal tables it needs (blockmap, freemap, etc.).

Opens any files it needs (/etc/fstab or /etc/checklist).

Phase 1 -- Check Blocks and Sizes

Checks the inodes (looks for valid inode types, correct inode size, and format).

Checks for bad or duplicate blocks (valid block numbers, blocks claimed by more than one inode).

If fsck finds problems with the data blocks, it then runs:

Phase 1B -- Rescan for More Bad Dups

Rescans to locate the inode(s) which previously claimed the duplicated block(s) found during Phase 1.

Phase 2 -- Check Pathnames

Checks directory inodes of the filesystem looking for inodes out of range, directories with zero length, corrupted directories.

Removes directory entries pointing to error-conditioned inodes.

Phase 3 -- Check Connectivity

Checks allocated inodes for un-referenced directories. (Parent directory doesn't exist or was removed by Phase 2.)

Places orphaned directories and files in lost+found.

Phase 4 -- Check Reference Counts

Checks for correct link count information.

Checks free inode information.

Adjusts the free inode and link count to actual values.

Phase 5 Check Cylinder Groups

Checks the free block maps.

Checks the summary information (free block, free inode, frag counts held in the superblock summary).

If problems are found in the cylinder groups, then the following phase is also run:

Phase 6 -- Salvage Cylinder Groups

Reconstruct the free block maps.

Clean-up

Produces a summary message showing the number of files, the number of fragments used, the percent of blocks used for fragments, and the number of free fragment-sized blocks.

Displays advisory messages warning of file system modification and/or directing the operator to reboot.

A Sample fsck Run

To see fsck in action, refer to the simple directory in Figure 1. Assume that a machine crashed while the system was updating the disk image, and that an error occurred while File1 was being moved to sub_dir/File5. The result is that File1 and sub_dir/File5 both claim the same data block. The directory sub_dir has also been corrupted. Running fsck on the filesystem as the system comes back up would generate something like the following sequence of events.

Phase 1 -- Check Blocks and Sizes

At this point fsck will find that sub_dir/File5 is claiming a data block which another file has already claimed. The inode will be marked as bad and the directory entry as well.

Phase 1B -- Rescan for More Bad Dups

The system is now looking for the other file that claimed the data block. In the example, this is File1. In real life the system will try to allocate the block to one file or the other. For present purposes, assume that it cannot make up its mind and will truncate File1 at this block.

File1 has been truncated and the remaining blocks marked as free. sub_dir/File5 is an invalid inode with no data blocks.

Phase 2 -- Check Pathnames

The directory sub_dir will be checked at this point. The entry for sub_dir/File5 will be removed, since the directory structure for sub_dir also has errors. The entire subdirectory will be removed as well.

To recap, File1 has been truncated and sub_dir/File5 has been removed, as has sub_dir. sub_dir/File1 still exists, but has lost its parent directory. The counts of inodes and free blocks are no longer valid.

Phase 3 Check Connectivity

The file sub_dir/File1 will be moved to the lost+found directory and given the name of its inode number. Its name has been lost because the directory was corrupt and has been deleted.

Phase 4 -- Check Reference Counts

The fact that the inode and free block counts are wrong will be detected and corrected. The inodes used by the deleted files will be added to the count, as will the released data blocks. The process is just about complete.

Phase 5 -- Check Cylinder Groups

Each cylinder group is scanned in turn and the bitmaps stored on the disk compared with the information held by fsck. After the changes made by fsck, the bitmaps no longer match so phase 6 will be triggered.

Phase 6 -- Salvage Cylinder Groups

Each cylinder is scanned in detail and the bitmaps are rebuilt to reflect what is currently on the disk.

When fsck is finished, File1 has been truncated, sub_dir has been lost, the file sub_dir/File1 has ended up in lost+found, and the file being written to sub_dir/File5 has been removed. A this point you can reach for the backup tape and, using the list of inodes you made while you watched fsck work, recover the affected files.

Conclusion

A basic understanding of the filesystem layout can help a system administrator recover from disk errors by using the backup copies of the superblock. It can also help administrators understand what fsck is doing and what has to be done to recover the files and inodes it has affected. n

About the Author

John Woodgate holds a BSc in Computer Science and has worked in the industry since 1977. He has worked as system administrator on large scale systems linked into international networks. He is currently working as a consultant and may be contacted at john@meertech.demon.co.uk.