Article

Understanding Filesystems

Emmett Dulaney

When users think of a filesystem, they think of the classic tree structure composed of subdirectories and files. Everything begins at the root directory (/) and branches from there. Root is divided into subdirectories, which are further divided into additional subdirectories, and so on. Along the way are files within the subdirectories that are stopping points on the branch, much like leaves on a literal tree.

UNIX was designed with ease of use in mind. This tree representation seemed the most efficient way to provide location and access to files and entities. It is no accident that users equate this with the filesystem: for most practical purposes, it is the filesystem. Administrators, however, need to know that this tree is only the way things look and not the way they really are. They need to know that, in reality, most of the time there is a flatfile-style database that keeps pointers to the data, and that the pointers and data comprise the real filesystem.

This article examines filesystems from an administrative view and explains what they are. In addition, it lists the tools used to maintain the filesystem and describes the most common types of filesystems.

What Is a Filesystem?

Simply, a filesystem is a collection of data. Every existing UNIX machine has a minimum of one filesystem on a hard drive. That filesystem is the root system, which boots when the machine is turned on. It contains the UNIX kernel (sometimes named /unix). That filesystem remains active for the entire time the machine is up. (For an overview of filesystems used by other operating systems, see the sidebar "How Do Other Operating Systems Treat the Filesystem?")

If there are additional filesystems on that machine, they may be mounted and unmounted at will. For example, once the machine is up, the root user can mount(1M) another hard drive partition that has database files on it and can add new entries to a mailing list database. After adding the entries, the root user can unmount the partition and mount another with payroll information on it. Key points to remember are:

You can't unmount the root filesystem.

You can't unmount a filesystem while it is in use.

You can't mount a corrupted filesystem.

When you install UNIX you decide which type of filesystem you will use. There are several different types to choose from. This article examines the most popular ones. All of them, however, consist of three components:

Superblock -- which contains information about the filesystem and the physical data comprising it.

Inode list -- which indexes the individual entries.

Data blocks -- which store the data.

For every entry in a directory listing, there is a corresponding inode(4) holding information about the file. Inodes are pointers to the data blocks holding the files. Inodes have the following components:

1. A unique inode number that increments by one with every file or subdirectory created. Historically, inode numbers zero and one are set aside for special purposes. The root inode begins with two.

2. A two-digit number for the type of entry:

01 -- a named pipe. 02 -- a character device file. 04 -- a directory. 06 -- a block device file. 10 -- an ordinary file. 12 -- a symbolic link. 14 -- a socket.

3. The permissions on the entity. Permissions are four digits. The first digit tells whether a special mode is set (1=sticky bit, 2=SGID, 4=SUID). The remaining three digits tell read, write, and execute permissions.

4. The physical size of the file.

5. The number of links to the entry.

6. The owner of the file, often in uid numeric form -- the same information returned by the id command).

7. The group possessing the file -- again, often in numeric form.

8. The timestamps for creation or change of the inode, for last modification of the data, and for last access of the data. These are used in directory listings and by other commands, such as when find looks for files meeting certain time criteria.

9. The physical data block addresses where the file resides.

These nine items constitute the inode, although their order varies from one UNIX vendor to another. To see your system's order, look for the file /usr/include/sys/inode.h, or for a file in /usr/include/sys/fs/ with "ino" in its name. (See the sidebar "NFS and RFS" for a discussion of network filesystems.)

Hard-linked files share the same inode number. Hard links cannot cross filesystems because each filesystem has its own inode list. All inode numbering starts at two and increments for the unique inode number. The same number can appear on two different filesystems and point to two different files.

Symbolic links, on the other hand, are nothing more than concrete pointers from one place to another. Symlinks have unique inode numbers. As pointers to other destinations they can cross filesystems. A symbolic link on the root filesystem can point to a particular file on the database filesystem even when the database filesystem is unmounted. (For more information on inodes and the utilities used to maintain them, see my article, "When Inodes Go Bad," in the July/August 1994 issue of Sys Admin.)

Different Types

In the traditional UNIX world, there are two types of filesystems. Which one you used once depended on whether you had chosen AT&T's UNIX flavor, or had opted for the BSD versions. AT&T's is known as S5, for System 5, reflecting the way versions are differentiated. Berkeley's filesystem was called the Fast File System (FFS) but is now known as UFS. Today, both are usually available on a new installation. Knowing which to use can help you tailor your system to meet your site's specific needs. (See the sidebar "Other Filesystems" for a brief discussion of other types of filesystems.)

Virtually all UNIX filesystems today use the concept of Read-Ahead, Write-Behind to increase their efficiency. With Read-Ahead, each time you access the disk, more blocks are read into the buffers than just the one you are requesting. Most of the time one request is followed by another. Often the next requested data is sequentially next to the currently requested data. It makes sense to read in more data than that requested. When the next request is made, that data is already in memory, so there is no disk access time involved. Write-Behind entails maintaining changes within the buffers and not on the disk. When activity slows down or the buffers fill, the buffers write to the disk and prepare for the next operation. Given that the slowest component inside any computer is the hard drive, you can see how these operations combine to increase processing efficiency over operating systems that don't use Read-Ahead and Write-Behind.

Figure 1 represents the physical structure of an S5 filesystem, which is also referred to by SCO as S51K. There is but one boot block, one superblock, and one inode list for the entire filesystem -- everything else becomes data blocks. The boot block (logical block 0) contains primary and secondary bootstrapping programs. If you create a nonbootable filesystem, the space is still set aside for the boot block, but that space is not used or accessible. While size can differ per vendor, usually sectors 0 to 15 constitute the boot block, and sector 16 is the beginning of the superblock. The data blocks can be either 512, 1024, or 2,048 bytes in size. Systems with many small files can save hard drive space by using blocks of 512 bytes, while systems with fewer files, larger in size, should have the block size set to the highest number possible.

UFS/FFS

Figure 2 shows a Berkeley Fast File System. UFS/FFS uses the same components as S5, but the components are not limited to appearing only once on the disk. Instead, the superblock and inode list are broken into smaller components and sprinkled throughout the data blocks. FFS was the default filesystem shipped with SVR4, and the name has now been changed to UFS for UNIX File System. Whereas S5 allows for structured blocks less than or equal to 2,048 bytes in size, UFS supports 8,192-byte blocks. Because data is moved in blocks and Read-Ahead involves reading in additional blocks, the larger the block, the more efficient the operation. A side feature of UFS/FFS is that it supports 255-byte names, as opposed to the 14-byte limit S5 imposes.

Its structure makes UFS much quicker and more efficient at operations than S5. Some estimates range as high as ten times quicker in disk throughput.

Helpful Tools

The most crucial of the filesystem tools is mkfs(1M). This utility creates the filesystem. fsck(1M) performs cleanup operations, should data or pointers get out of sync. Other utilities that are available depend on the UNIX vendor. These utilities can include format or fmthard, both of which are used to format the hard disk; dumpfs, to obtain information about the existing filesystem; and newfs, to rebuild a filesystem without starting from scratch.

Selecting Which Filesystem to Use

Determining which filesystem is right for your installation can be daunting. The answer is not always black and white, nor may it be universal for your entire system or network. You can have multiple partitions on your system, each configured with a different filesystem. The first question to decide is whether you need many or any multiple partitions. There are a number of advantages to having a small number of large partitions:

The fewer the partitions, the easier it is to back the system up and restore it after a full system crash.

The fewer the partitions, the less likely it is that you will run out of space in one of them.

With many small partitions, it is hard to allow for growth in any one of them. Balanced against this are a significant number of advantages for multiple, smaller partitions:

Small partitions make it easier to fine tune the inode-to-disk-space ratio. With many small files, you may easily run out of free inodes while there is still data block space remaining on the partition -- thus you cannot create and save new files. The only solution is to remake the filesystem anew.

Small partitions allow for quick backup and restoring of selected partitions.

If the operating system is stored in a small partition by itself, upgrading is much easier.

Smaller partitions are easier to defragment. Fragmentation occurs due to constant growth of some files and removal of others. When it occurs, the only solution is to make a complete backup of the filesystem, delete all the original files, and then restore them. Restoring to an empty filesystem writes the files without fragmentation.

Once you have decided on the number of partitions and the purpose each is to serve, the next step is to decide which filesystem is most appropriate for that partition. Mixing and matching is not only possible but recommended.

Think about what the partition will be used for, then ascertain which is most appropriate for that application. With the root (boot) partition, S5 works well, because the boot block will be used, and because the files within that partition do not change or grow regularly. With a partition containing database files, UFS makes more sense. For such a partition, read and write access must be as quick as possible. Moreover, multiple superblocks provide a degree of protection against corruption. Should corruption occur, it will affect only a percentage of the filesystem's superblock rather than the whole.

References

Goodheart, Benny, and James Cox. The Magic Garden Explained. Englewood Cliffs, N.J.: Prentice-Hall, 1994.

Sheldon, Tom. LAN Times Encyclopedia of Networking. Berkeley, CA: Osborne McGraw-Hill, 1994.

About the Author

Emmett Dulaney is a product developer for New Riders Publishing, and an associate professor of Continuing Education at Indiana-Purdue University of Fort Wayne. He can be reached on CompuServe at 74507,3713.