Article

Using fsck to Check ufs File Systems

Tom Clark

Introduction

A file system is a mechanism for locating files on a storage device recording medium, the device implementing a compatible internal data storage structure (e.g., random access disks with fixed block sizes and sequential access tapes with variable block sizes). The accessible storage can be considered as an organized "pool" of data storage resources. There are several common types of UNIX file systems [1-2], but this article discusses only the UNIX ufs file system. The concepts presented are applicable to all UNIX file systems being checked using fsck.

File systems are generally logically independent of the required underlying physical storage devices, although their operation is directly affected by these storage devices (for example, errors affecting the operation of the devices commonly affect the file system structure or the files maintained by the file system). However, not all file system errors can be related to an error reported by the underlying storage device (some other storage device, or source of error, may be the root cause).

For simplicity, my discussion will be limited to storage devices attached to the host computer via the SCSI (Small Computer System Interface) bus. These devices support an addressable linear array of identical logical blocks, each block containing 512 bytes of data (other common block sizes are 1024 and 2048). Focusing on SCSI systems dictates to some extent problem determination and error recovery/restart procedures, since these procedures are designed to support the selected storage devices. In conjuction with the discussion of using fsck to check ufs file systems, I present a system administrator's script for checking multiple file systems.

SCSI Disk Error Model

A very high percentage of SCSI disk errors occur when data is retrieved from the medium. Errors that occur when storing data are usually not detected until the data is retrieved, although if the device is unable to locate the proper block on a write operation it can immediately signal a write error.

Even though the majority of accesses to a disk are data retrieval operations, complete media testing entails testing both storage and retrieval operations. The device that supports defect management enables linking around defective blocks. Typically, this service is performed either in response to the execution of a disk maintenance command or at the direction of a disk device driver in response to a "hard error" reported by the device.

A "hard error" exists: 1) where the device signals that it is unable to recover data (e.g., too many errors); or 2) where successive unsuccessful attempts to correctly retrieve data (errors reported along with the data) are halted by the device driver.

The disk driver may have previously requested that the device automatically reassign a "bad" block during an initialization phase. If not, it may issue a separate command to the device to reassign the known "bad" block. It may also choose to do nothing, leaving a hole in the linear array and presenting an opportunity for another utility to attempt a patch of the linear array.

Successive unsuccessful attempts to retrieve data followed by a single successful data retrieval can occur before the device driver halts the data retrieval operations. Each such unsuccessful attempt is termed a "soft error." Excessive "soft errors" are indicative of future problems and should be corrected during preventive maintenance.

The reassignment of a data block may result in data loss even though performed without additional errors. This usually happens when the device is unable to correctly recover data from the "bad" block and transfer it to the patch block.

A device driver typically stores whatever data has been retrieved from a "bad" block and writes it to the patch block after completion of the reassignment. Where no data is retrieved, a block of ZEROs is written to the patch block. Little else can be done to recover data directly; instead, restoring files from recent backups should be considered.

Device maintenance commands such as format can be used to "patch" the linear array of logical blocks. you can perform surface analysis to detect and reassign "bad" blocks and should do this periodically since the recording media degrades with time.

ufs File System

A ufs file system can be represented simply as a large data structure composed of sequences of one or more smaller data structures (referred to as a cylinder group) each composed of the following[1]:

-- Offset

-- Super-block

-- Cylinder Group Map

-- Inodes

-- Storage Blocks

The "super-block" contains information on the size and status of the file system, the label (obtained from block 0 of a SCSI disk), and the cylinder group. Multiple "super-blocks" are created and used to repair ones that are bad.

Inodes contain all information about a file except its name (kept in a directory). Typically, one inode is created for every 2048 bytes of available storage (this can be altered when the file system is built; refer to the mkfs user command). An inode contains information on the:

-- file type (regular, directory, block, character, symbolic link or FIFO/pipe),

-- file permissions,

-- number of hard links,

-- user-id and group-id,

-- number of bytes,

-- first 12 disk block addresses,

-- three indirect pointers to additional disk block addresses, and

-- file time-related data.

The majority of blocks in a cylinder group (1-to-32 cylinders/group) are allocated to storage blocks. The UNIX user command fsck is used to quickly check the super-blocks and the inodes for file system inconsistencies:

1) during the "install" phase where the operating system is loaded onto the system disk and configured for normal operation,

2) during a "bring-up" of the operating system,

3) when adding a new disk to the system,

4) when analyzing problems associated with a file system supported by a specific physical disk,

5) during repair procedures prior to returning a disk device to normal service, and

6) during preventive and predictive maintenance procedures (such as a disk probing tool that looks for file system-related inconsistencies).

fsck Check Sequence

fsck sequences through the following phases:

Check Blocks and Sizes (file system inode list),

Check Pathnames (directory entries),

Check Connectivity (one directory for each inode and multiple links make sense),

Check Reference Counts, and Cylinder Groups (link count and alterations made previously), and

Check the Free List (blocks are allocated to an inode or the free block list).

(Refer to Thomas and Farrow [5] for a more detailed description of the check sequence. The information presented here has been condensed from the references listed on page 88.)

An underlying presumption regarding the use of fsck is that files with corrupted inodes should be replaced from backup copies and that the user is able to keep track of the files needing such action. The following key factors also affect the use of fsck:

-- mounted (active) file systems may change while being examined by fsck,

-- any change that occurs in a file system while running fsck can produce inconsistencies,

-- inconsistencies may be minor enough to result in an automatic repair action by fsck,

-- major inconsistencies require user intervention, and

-- false inconsistencies are treated as though they were actual inconsistencies.

To avoid related problems, fsck does not work on mounted file systems, the exception being the root (/) file system. fsck can be run on root while in single-user mode.

However, there are two interfaces to a block storage device (block and character, or raw), and it is possible to run fsck on a mounted file system through the raw interface. Doing so makes the check vulnerable to the problems that could arise if fsck were run on an "active" or mounted file system. All sources strongly recommend that fsck be run on unmounted file systems, and on the root file system when in single-user mode.

The "-y" Option

The "-y" option allows fsck to assume yes as a response to all queries about repair actions. This option should not be used on a file system that contains important user data. As I explain later (in the "Repairs" section), there are cases where fsck should not be allowed to perform a repair action.

Install Phase

The install phase yields a fully operational, configured operating system upon successful completion. It requires the identification of at least one operable disk (the system disk) upon which the necessary number of supported, operable file systems (determined using fsck) can be built. This in turn requires that the host computer and hardware supporting the system disk be operable (this can often be checked at a lower level of functionality than the install phase, e.g., in cases a PROM monitor supports communication with attached devices).

At all points in the install phase, system disk-related problems can result in failures and aborts that require user analysis to determine proper responses. Since install processes rarely attempt a detailed verification of the underlying hardware, the user must often rely upon past history to select an appropriate error recovery procedure (i.e., the presumption is that the install would be performed only on a fully operational system).

A successful install process does not guarantee that the system disk is completely operational. The install process transfers files to the system disk and prepares it for normal operation, a procedure which requires many storage operations that are not followed by the retrieval operations that would normally detect disk-related errors. Errors encountered after a successful install may actually relate back to the time of the install.

Since fsck may be the only diagnostic tool to used by the install process to check system disk operability, errors reported in an install should result in a more detailed test of the host computer and attached storage devices (some computer system vendors provide very detailed system operability test packages). You should not, under such circumstances, attempt to continue the install process.

A full destructive surface analysis of the media before installing an operating system on a SCSI disk or after subsequent disk-related errors have been encountered will detect most latent media-related errors that could cause "install" phase and future errors. This will also require that the user re-install the operating system.

The following are useful guidelines for the install phase:

If an error is reported by the system disk or any other hardware component, analyze the problem to see if a service call is necessary (can the error be corrected via simple user action?).

If the install has been corrupted by the reporting of an error, try re-starting the install process.

•If fsck has reported errors, the likely suspect is the underlying storage device, since it is allegedly operable, a new file system has just been built on it, and little time has passed for data loss or corruption to occur.

If the install has not completed or initial checks have not performed successfully, take the system down to single-user mode and use fsck as a diagnostic probe.

Consider any errors that occur during an install phase to be indicative of an abnormal condition.

Bring-up Phase

Presuming the successful completion of an install, the system will at some time be booted and a "bring-up" phase initiated. During this phase the function of fsck is again a very quick check of file system inconsistencies. If minor problems are found, fsck may be able to correct them and continue. If not, the "bring-up" process may require that fsck be run independently, i.e., some repair actions will be necessary.

fsck is important here because the system may have been shut down incorrectly, may have encountered a "panic" condition, or may have shut down due to errors or equipment modifications. Earlier statements regarding fsck apply here as well, but there is one notable additional factor -- recorded past history in the system log.

Problems reported by fsck during a bring-up phase should prompt the user to scan the system log for device-related errors.

Both "soft" and "hard" errors should be analyzed, since soft errors can evolve into hard errors.

Where difficult errors are present, especially those affecting the system disk, a suggested approach is to boot an operating system image over the net or from a local CDROM device, or integrate into the host computer a known good disk containing an operating system, and use it to perform further checking and data recovery.

Problem Determination

fsck is not a particularly useful tool for problem determination. Errors reported by fsck require a good deal of interpretation based upon intimate knowledge of the file system and the underlying storage device. Such errors should always be analyzed in conjunction with the system log, past history, experience, and a good backup/restore facility. The proper next step is rarely obvious from the query/responses provided by fsck.

Maintenance

Preventive and predictive maintenance procedures should use fsck to probe for file system inconsistencies. Such procedures should first perform full media "non-destructive" testing to ensure that no blocks are accessible that could cause a future "hard" or "soft" error. "Destructive" media testing should be undertaken only after full data recovery has been completed.

It is possible for maintenance procedures themselves to encounter errors (even system "panic" conditions) that have no relation to a recorded storage device error (e.g., data corruption has occurred and the maintenance procedure is processing garbage). This should not dissuade you from developing and performing them, but should help you to see the importance of executing them in sufficiently secure environments (e.g., full backups performed).

Repairs

fsck has a repair feature that can correct minor problems but can also, if used inappropriately, do major damage to a file system. fsck is presumed to be an expert on the file system and to have the proper repair action selected before requesting user permission to proceed. The "-y" option, for those who have complete faith in its ability, causes it to automatically repair any system-related errors it encounters.

In a number of cases, bowever, fsck's suggested repair is not appropriate. In most of these cases, fsck will be asking for permission to remove a file or clear an inode; authorizing the repairs without investigating can result in data loss. The following cases have been identified empirically.

Case 1

SORRY: NO lost+found DIRECTORY or NO SPACE in lost+found DIRECTORY

Clearly there is a problem with the lost+found directory that requires immediate attention. fsck should be terminated and other actions taken to resolve problems.

If no such directory exists, create one and then rerun fsck. If there is no space left in the directory, and the number of files in the directory is large, rerunning fsck is inappropriate (instead, find out why so many files were placed in this directory). Where large files exist in the directory, attempt to determine if the large files are valid and are not file fragments. An example would be a copy operation that could not complete successfully, with the result that only a portion of a file remains in the target directory.

Case 2

DUP TABLE OVERFLOW

The DUP table stores a list of inodes with duplicate blocks. This message occurs when the table runs out of space. Such table overflow should put the user on notice that unusual conditions have occurred and the potential for data loss is high. The recommended action here is to "write down the inode numbers of inodes with duplicate blocks found after this point, and don't REMOVE any of the filenames connected to the inodes or CLEAR these inodes." [5].

An alternate suggestion, given that the overflow represents an abnormal condition, is to stop at this point and attempt to recover whatever files are accessible before proceeding. The recovery action could simply be a raw disk copy (using the dd user command).

Case 3

Read, Write or Seek Errors

These messages indicate that a "hard" error has occurred, and fsck is asking if it should continue in the presence of such errors. Only if you are very familiar with the attached physical devices and with what fsck is attempting to accomplish should you attempt to continue. A possible alternative is to perform a "non-destructive" surface analysis of the attached storage device.

The surface analysis should reassign a "bad" block if it encounters a "hard" error (refer to the "SCSI Disk Error Model" section earlier). Since data may be lost in the procedure, you should record all reassigned blocks , rerun fsck, and mark as suspect all files affected by the reassignment. Upon completion, you should consider restoring files from recent backups.

Case 4

PARTIALLY ALLOCATED INODE I = 14

CLEAR?

Legal inode types are given in reference [5]. A partially allocated inode is one that has a type of 0, but some information appears in the mode word. This often indicates a block containing garbage. The occurrence of many of these suggests that the file system is likely to have widespread damage, including corrupted files.

A good practice here is to record the inode numbers so that, if they become needed in phase 2, the filenames linked to the inodes can be found. Data recovery after completion, as well as close scrutiny of the file system during preventive and predictive maintenance procedures is also recommended. If actual data corruption is found, the storage device should be re-evaluated and the file system rebuilt before returning the storage device to service.

Case 5

LINK COUNT TABLE OVERFLOW,

CONTINUE?

This message indicates that there is no more room to store inodes that have a zero link count and will recur for all subsequent inodes with zero link counts. If this is the only error reported, you can allow fsck to continue, but if multiple errors are reported, fsck should be terminated. Upon completion or termination, a file recovery procedure should be performed.

Case 6

EXCESSIVE BAD BLOCKS I=13

CONTINUE?

Ten bad blocks have been detected while checking this inode's blocks. Something is seriously wrong, and it is time to do something other than continue with fsck.

Initialization

Initialization errors are worth mentioning because they can be triggered by recent problems with devices that have been performing normally over some period of time (storage devices can fail suddenly). When fsck is initiated, perhaps in response to an error message, the following error can be quite surprising:

Cannot stat <device name>

Here fsck cannot obtain information on the file system supported by <device name>. It is possible that the file system does not exist, cannot be opened due to permissions, or has been removed from the device tree by the device driver (e.g., the device no longer responds to commands). The appropriate response is to immediately initiate a problem determination procedure on the underlying storage device.

Repair Summary

Minor problems automatically detected and corrected by fsck rarely result in data loss or corruption. However, the potential for data loss or corruption increases up to a certainty if the user chooses to continue running fsck in the presence of clearly dangerous major errors. By placing complete faith in the ability of fsck to detect and correct errors (the "-y" option), you lose control over the repair process and may remain unaware of the presence of major problems that require immediate attention, e.g., widespread data corruption.

Administrator's Bourne Shell Script

When a system administrator has to check multiple file systems simultaneously, a tool for performing the checks and highlighting the problems becomes handy. The mfsck script (Listing 1) allows you to specify a large number of disks and partitions/slices to check simultaneously. It's important that you avoid overloading the OS when doing this by initiating too many processes.

mfsck supports a very simple interface which is displayed if no arguments are provided:

mfsck <task file>

where,

<task file>  is the name of a file containing the disks and the file systems to check.

The syntax of the file <task file> is

#         comment line
<logical device> <partition> ... <partition>

as in

#
c0t5d0 0 1 6
#

The <task file> below contains a single entry and points to a nonexistent disk:

# cat task.file
c8t0d0        3
#

Executing the script with this <task file> yields the output in Figure 1.

Using a single-entry <task file> that contains a valid disk looks like

# cat task.file
c1t2d0
#

and yields results like those in Figure 2.

The fsck operations are performed in the background and the script checks for completion. The script can be broken up into smaller files, as originally designed, and further simplified. The present form is intended to be simple and straightforward, facilitating modifications by users.

Multiple <task file> specifications can be built in anticipation of future requirements. Log files generated by the script can be removed or saved at the user's option. Where errors are encountered, the log files should be scanned to determine where the error(s) occurred.

Conclusion

fsck is a tool for checking file systems that requires a high level of skill and experience to use effectively. fsck can be considered, in most cases, as an expert in selecting the best "next step" in a repair process. However, it has a very limited, and in some cases confusing, user interface. Moreover, it requires in some cases that the user record data to be used at a later stage.

fsck also has a high potential for data loss and corruption, since checking is limited to the file system. It can be used effectively as an indicator of possible current damage, but provides no assistance in the development of appropriate error recovery procedures.

Even given its limitations, however, when combined with a detailed understanding of the underlying storage devices and an understanding of the file system, fsck remains a very useful tool in the system administrator's toolbox.

References

1. UNIX Software Operation. UNIX System V Release 4 System Administrator's Guide. Englewood Cliffs, N.J.: Prentice-Hall, 1990.

2. Bach, Maurice J. The Design of the UNIX Operating System. Englewood Cliffs, N.J.: Prentice-Hall, 1986.

3. Nemeth, Evi, Garth Snyder, and Scott Seebass. UNIX System Administration Handbook. Englewood Cliffs, N.J.: Prentice-Hall, 1989.

4. Fiedler, David, and Bruce H. Hunter. UNIX System Administration. Indianapolis, IN: Hayden Books, 1987.

5. Thomas, Rebecca, and Rik Farrow. UNIX Administration Guide for System V. Englewood Cliffs, N.J.: Prentice-Hall, 1989.

About the Author

Tom Clark has been working with UNIX since 1984 as an applications developer. He is currently working as a system software architect for Sun Microsystems SMCC System Software Quality Assurance group in Mountain View, Ca.. Tom has a B.S.E.E. from the University of New Mexico, an M.S.E.E. from Wichita State University, an M.S.C.S. and Engineering degree from the University of Southern California, and a B.S.L. and J.D. from Peninsula University. He can be reached at Thomas.Clark@sun.eng.COM.