Cover V03, I04
Article
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

jul94.tar


When Inodes Go Bad

Emmett Dulaney

A user complains that the system is acting sluggish, or data corruption has occurred. You fire off fsck and the first thing that comes up in Phase 1 is a line similar to:

POSSIBLE FILE SIZE ERROR I = 317

Besides knowing that messages in all capital letters are considered shouting, it's hard to know what to make of this "possible" error.

Phases 2 and 3 pass relatively uneventfully, then Phase 4 (checking connectivity counts) shouts that there is an unreferenced file, and gives the inode number of that file. The message further asks if you would like to remove the file. After you answer, a message informs you that the free inode count is now wrong in the superblock.

Before answering the very first question of whether or not to remove the file, it is important to understand the concept of inodes and the superblock. The purpose of this article is to explain each of these items and describe the utilities used to reference them.

Superblocks and Inodes

Filesystems are groups of files which exist within a partition on a disk -- the partition may be physical or logical. On every filesystem, there is an initial sector on the disk where all files logically map to. This is known as the superblock.

The superblock, of which there is only one per filesystem, contains information about the filesystem, and the physical data comprising it. In the PC world of DOS, the superblock is known as the File Allocation Table. Both serve the same purpose of mapping file locations to physical addresses.

The superblock, which is block 0 of the filesystem, is broken into smaller components, or individual entries for the filesystem, known as inodes. The physical location of the inode list is directly after that of the superblock. For every entry that shows up in a directory listing, there is a corresponding inode providing that information. Inodes, or "index nodes," consist of between 64 and 200 bytes (depending upon the vendor), and are the focus of all file\directory activity in UNIX; they function as pointers to the physical data.

Inodes can be thought of as address books giving information about the files. Just as one address can apply to two or more people, one file can be referenced by several names (links). Two linked files will share the same inode number; thus when you execute a link, the original set of data is pointed to and called.

While the exact ordering of information occasionally differs by vendor, each inode entry consists of the following information about a file or directory:

1. A unique inode number.

2. Type of entry. There are seven types, and each type is assigned a two-digit value:

01 -- a named pipe 02 -- a character device file 04 -- a directory 06 -- a block device file 10 -- an ordinary file 12 -- a symbolic link 14 -- a socket

3. Permissions on the entity. Permissions are four digits, with the first indicating whether a special mode is set (1=sticky bit, 2=SGID, 4=SUID), and the remaining three telling read, write, and execute permissions.

4. Actual physical size of the file.

5. Number of links to the entry.

6. Owner of the file (often in numeric form -- the information returned by the id command).

7. The group possessing the file -- again, quite often, in numeric form.

8. Times of creation, modification, and access. This is referenced in directory listings, as well as when the find command looks for files meeting certain criteria.

9. The actual physical addresses where the file resides.

These are the components of inode entries, but the ordering may be different than as shown here. To check the ordering on your system, look for the file /usr/include/sys/inode.h.

Inode Functionality

Directories consists of inode entries that pair them with files. Structuring the filesystem this way, makes it incredibly easy to move entire directories in UNIX. In reality, all that changes is the name associated with the inode.

When files are linked, multiple references to the same inode occur. Deleting files removes only the references, not the physical data, so long as other files still point to it. Figure 1 shows an example of linking a file to multiple names. During the process, the directory listing makes it appear as if more files are being created, but the inode numbers tell differently. The link numbers associated with the files tell differently as well. Notice how the numbers increment with the links.

Figure 2 shows the same set of files, but now the files are being deleted. The original file goes first, then the first link. The second link is now no longer a link, but it still contains the same information -- by virtue of still referencing that same inode.

Inode numbers must be unique for physical data. The way this uniqueness is traditionally achieved is by incrementing the number for each new entry. Thus the first data placed on the filesystem has the lowest inode numbers. Figure 3 shows the numbers associated with files on a sample system. As a good rule of thumb, anytime fsck finds a problem with an inode number that is small, there is a serious problem with the machine.

The amount of space allocated for the inode table is determined when you make the filesystem (mkfs). If too much space is allocated, you are wasting room that could be used elsewhere. If not enough space is allocated, you will fill the table. The table contains entries -- one for every piece of data out there -- so it doesn't matter if the hard drive has plenty of space left on it. If there are thousands of tiny files, those thousands of entries can fill the inode table. Once an inode table is full, there is nothing to do but rebuild the filesystem.

Tools to Use

A directory listing that includes inode numbers can be performed by using the "-i" option with ls. To get a full listing of the file, complete with inode numbers, use ls -li (as shown in Figure 1).

Most standard UNIX packages include several vendor utilities intended for use when dealing with inodes. The first is /etc/ncheck. When used with the "-a" option, as in ncheck -a, it generates a filesystem list similar to that created by du, but including inode numbers in place of used data. Figure 3 shows a file list generated using this option.

The "-i" option is used to find only the filenames matching a specified inode number. The fsck operation at the beginning of this article showed a problem with inode 317.

ncheck -i 317

would find the file, or files, owning that inode number. In order for ncheck to work properly, the name of the filesystem must be contained within a text file, /etc/checklist.

Another useful utility is /etc/clri, which clears an inode entry by writing zeros on the space occupied by the entry. After clri executes, any blocks that were previously occupied will show up as being unaccountable in a fsck operation. This utility should be used with extreme caution: it is recommended only for removing entries that show up as being occupied, yet point to files within no directories (this can create a loop where a new file gets assigned the inode, yet it points to the old garbage).

The last utility is /etc/mknod. As its name implies, it is useful for making directory entries, and the corresponding inodes, for special files. Special files may be, among other things, block devices, character devices, or named pipes.

Troubleshooting

The next four examples will document problems that can occur with inodes, how the problems show up in fsck listings, and how to correct the problems.

The first example, shown in Figure 4, is that of a "possible" file size error. Neither owner nor group is reported, and the only reliable information is the inode number. ncheck -i suggests that the culprit is a database file -- with a link -- that is giving erroneous information. The most likely scenario here is that someone failed to exit the database completely and corrupted the inode. The solution is to remove the link (because both point to the same inode, it doesn't matter which one is the true file and which one is the link -- removing either suffices), then copy the file over to a new file. The causes the data to be read and a new inode to be created. Last, move the new file over the old one and recreate the link. With a new inode pointing to the file, all is well -- as shown by rerunning fsck.

In the second example, illustrated in Figure 5, an unallocated inode is hanging out in cyberspace. The inode number is given, as well as the owner. The mode indicates the file type and permissions. Since what is being referred to is a "non-file", the mode is zero. The size is also zero, but the time of modification and name of the entry that once resided there are shown. Since the size of the file is zero, fsck reports that it has taken action and removed the file. The dichotomy is that it cannot remove something that is not there -- "(EMPTY)". What it is trying to do is remove the reference. What actually happened on this machine was that a hard drive crashed and all data went south -- thus explaining the references to files that are no longer there after an attempt to restore the drive.

After the last phase, fsck it reports that there is a Bad Free List, and asks whether it should be salvaged. In a real-world situation, you always try to salvage what you can and should answer yes. I answered no only for purposes of this illustration. The message comes up to boot the operating system without performing a sync operation. Notice, however, the number of files, the blocks used, and the blocks free (41,402).

I booted the operating system again, made no changes whatsoever, and ran fsck again. The first five phases flew by with no problems. The Bad Free List still remains, however; this time, I answered yes and it was salvaged. Notice now the number of blocks free (61480). Even though the unallocated inode entry was corrected on the first fsck pass, the space was not returned to the system until the free list was salvaged.

Figure 6 shows the third example. Here a file is unreferenced, meaning the inode table shows it as being in use while there is no corresponding entry in any directory. The owner of the file is root (user 0). If it were user karen, it would show up as her ID number (from the /etc/passwd file). The mode here is important -- the first two characters indicate that it is a regular file, and the last three digits show its permissions as being read and write for the owner, and read for everyone else (the third digit, by being zero, shows the sticky bit is not set). It is safe to say that this is not an executable file. A good guess as to what happened here is that a file was deleted, and processing was interrupted in the midst of the deletion. With a file size of zero, it is safe to clear the file and correct the inode table.

In the last example, in Figure 7, an inode shows up as unreferenced and fsck prompts for removal. Though I answer yes to the prompt, the inside remains there, as subsequent runnings of fsck show. It turns out that the inode is linked to the console. On a UNIX machine, the console basically has permissions over everything else. The file /dev/console is a special, character-based file, referencing the only terminal allowed to login during sensitive maintenance operations. The example shown here is a real one, taken from a system that was dropped from the back of a truck as it was being moved from one site to another. The unreferenced file was keeping the system from changing to multiple user state.

After managing to get another terminal logged into the machine, fsck still failed to clear the inode. Attempts were made to remove the file (and thus clean up the entries in the inode table), but they failed, and it become necessary to use clri. After the mode was removed, the character file was rebuilt with mknod, and relinked to the aliases it uses. With the operation done, fsck was able to fly through its checks without uncovering any problems, and the system was able to go to multi-user state without further delay.

Summary

The inode table stores information on every physical piece of data stored on a UNIX filesystem. Within this listing is kept information on disc block locations, size, ownership, permissions, and so forth. If an inode degrades, the associated data can be damaged, or completely lost. This article has discussed some of the utilities of use in looking at inodes, as well as in clearing and recreating them.

About the Author

Emmett Dulaney has contributed to several books on UNIX, and is currently a product developer for New Riders Publishing. He can be reached on the Internet as Edulaney@Newrider.mhs.compuserve.com.