Cover V05, I08
Article
Sidebar 1
Table 1

aug96.tar


Hidden Dangers of NFS Mounting Foreign Filesystems

Doug Morris

NFS (Network File System) facilitates file access transparency but does not assure semantic transparency. If the local and remote filesystems do not share the same characteristics and behaviors (i.e., the remote UNIX filesystem is foreign), the resulting difference in filesystem semantics (semantic gap) can be a breeding ground for system problems.

These semantic differences (see sidebar "A Note on Terminology") are generally more subtle than differences due to machine architecture (i.e., word size/addressing, character and data type representations int, float, long, real, etc.). NFS uses a protocol called XDR (External Data Representation) to encapsulate all data types into standard representations that are encoded/decoded on the client and the server. XDR encapsulation can only be performed when the data types are known, which generally is not the case for most files. Unless directed otherwise, NFS treats file contents as text and does not employ additional encoding/decoding. XDR is used primarily for parameter passing in internal NFS remote procedure calls.

What Causes the Semantic Gap?

NFS is a client/server protocol for transparent file access. It translates client-side I/O requests into NFS remote procedure calls interpreted on the NFS server. When the client and server have identical operating and filesystem environments, the local I/O requests are satisfied against the remote filesystem (server) in a manner transparent to the local operating environment. Unfortunately, when differences exist, a client-side I/O request can be satisfied against the remote server in a way completely unexpected by the local filesystem. When this happens a semantic gap or difference in meaning exists between the local and remote filesystems.

Common Problems

The most commonly encountered problems relate to differences in permissions, support (or lack thereof) for sparse files, and differences in compilation or execution environments.

POSIX chown-restricted/unrestricted behavior. If the client-side operating environment restricts chown to root (POSIX chown restricted) and the server side does not (POSIX chown unrestricted), then the client-side restrictions will be lost and result in overall POSIX chown-unrestricted behavior for remote files.

Sparse files. If the client-side operating environment supports sparse files (i.e., files that contain holes) and the server side does not, then file holes will be zero filled and read performance could significantly suffer.

Holes are created by writes preceded by lseeks beyond the end of file. A file system that supports holes will extend the file logically but not allocate disk space for the hole. A file with a size of 1 Gb may actually only occupy 4 Kb bytes of disk space. When a hole is read, binary zeros are returned without actual disk reads being performed. Sparse files can be very useful in random access applications using hashed keys. (Hashing is a file access technique in which a key is transformed into a file offset that can be directly read.)

Program misunderstandings. It is common at compile/link time for a program to include system configuration parameter values. Assuming the program is compiled/linked on the client-side (or a similarly configured system), the parameter values will be valid on the client but may not be valid on the server. When this misunderstanding occurs, the program may act strangely or abort when accessing remote files.

Buffer sizes. Program read/write buffers are commonly sized to the MAX-BUFFER-SIZE kernel parameter. If the filesystem block size on the remote system (NFS server) is larger than MAX-BUFFER-SIZE on the client, then a block read (i.e., a read for BLKSZ bytes where BLKSZ is the block size returned by the stat system call ) will return more bytes than the client buffer will hold. The buffer overflow will overwrite program or data areas resulting in an error or incorrect results.

Path/filename length limits. The server-side could return a path/file name longer then the client-side program expects. A simple non-UNIX example would be an MS-DOS program receiving a UNIX filename when accessing a UNIX directory using PCNFS.

Other sysconf/pathconf limits. UNIX defines two system calls to retrieve system and pathname limits, sysconf and pathconf. Sysconf takes an integer argument corresponding to an assigned system variable name and returns the currently configured value. The X/Open specification for sysconf lists more than 30 variables. Pathconf takes a pathname and an integer corresponding to an assigned variable name as arguments and returns the configured value for the filesystem containing the pathname. The X/Open specification for pathconf lists nine variables.

A client-side server-side difference in any key sysconf or pathconf variable could be a potential source of problems. A summary of these key variables are listed in Table 1.

True NFS Horror Stories

I know of one computing installation in which the staff used an SGI as a Sun fileserver. The files contained on the SGI were Sun files -- Sun object files and data written by Sun programs. This solution seemed very safe to the installation staff. Unfortunately, this misperception resulted in many problems.

The MAX-BLOCK-SIZE on the Sun was 8192, and the SGI was configured with a filesystem block size of 32768. A popular UNIX project management tool on the Sun that ran fine against native Sun filesystems failed when run against Sun files on the SGI. This program unfortunately did a block read based on the block size returned by stat. stat returned the block size of the remote file 32768, but this was much larger than the program's buffer, which was sized to the 8192 MAX-BLOCK-SIZE of the Sun.

The Sun was configured to the default POSIX chown-restricted behavior, and the SGI was configured to the default POSIX chown-unrestricted behavior. The Sun chown shell command checked for a uid of root before calling the chown system call; it aborted when any non-root user tried to change ownership of a file. Because it checked for root before calling the chown system call, the shell command preserved POSIX-restricted behavior. However for any program calling chown directly, the behavior was different. The Sun system call on the remote filesystem was translated by NFS into an SGI call. The SGI was configured POSIX chown unrestricted, so any non-root user could change ownership of the file as long as he/she owned the file.

At first, this inconsistency may not appear to be a significant problem, but most system utilities that need to change ownerships use the system call, not the chown command. tar (which on the Sun by default attempts to set ownership of extracted files to the archive owner) suddenly failed for all non-root users attempting to extract files created under another uid. When the files were native to the Sun, the chown system call failed with a return code that signaled tar to extract the files without preserving ownership. tar completion resulted in all extracted files being owned by the executing user. Over NFS, the chown call succeeded, and the high-level directories were created with ownership assigned to the archive owner. Subsequent file extractions to these directories failed because the user was not the owner, and the users were left with empty high-level directories they did not own.

SunOS supports sparse files. The filesystem native to SGI Irix does not. Sparse files that occupied little space on the native Sun filesystem were much larger on the SGI where the holes had to occupy real disk space.

Many other potential problems existed, although they apparently were not experienced. The Sun and SGI differ in a number of sysconf and pathconf variables. Each of these variables could have been the source of problems and inconsistent behavior.

Summary

NFS is a reliable client/server protocol for transparent access to remote files. Usually NFS access presents few transparency problems, but only if the local and remote filesystems are semantically similar. When semantic differences exist, NFS becomes a potential source for many problems, including security inconsistencies and strange program behaviors.

Semantic similarity does not necessarily mean operating environment similarity. It simply means that the client and server interpret NFS-related system calls in the same way. The client and server may run different operating systems and have very different hardware architectures.

In general, you should avoid remote mounting filesystems that have different characteristics and behaviors than the local filesystem. When this cannot be avoided, proceed with caution, and seek the advice of your software vendors. Software vendors may not support such usage _ most expect their programs to be run against local filesystems or identically configured remote filesystems.

References

X/Open CAE Specification, Protocols for X/Open Interworking. XNFS. 1992, Issue 4. X/Open Company Ltd., UK.

X/Open CAE Specification, System Interfaces and Headers. XNFS. 1992, Issue 4. X/Open Company Ltd., UK.

About the Author

Doug Morris has a B.S. in Mathematics and an M.B.A. in Management Information Systems. He is a UNIX enthusiast and long-time Open Systems advocate. He has been an active participant in XOpen, including holding an elected office on the XOpen User Council Executive. He has held various positions in technology evaluation and management at several Fortune 500 companies, including his current position as a Systems Specialist/Software Engineer at a major international oil company. He can be reached at his personal email address of damorri.msn.com.