apr2002.tar

Using Lib C and I/O and Performance

Henry Newman

I will be writing a new column for Sys Admin magazine on storage and I/O related topics. To start, I will discuss how I/O works from the application. Applications generally make requests to create files in at least two ways via POSIX standards. One way is that the file is created/opened with the open(2) system call, and I/O is done directly to the system via calls to the raw device, volume manager, and file system (read(2), write(2), pread(2), pwrite(2), or the POSIX standard asynchronous I/O routines aio_read(3RT), aio_write(3RT), lio_ listio(3RT)). With the other method, the file is created/opened with fopen(3), and I/O is performed via the standard C library package (fread/fwrite/fprint).

The fopen(3) method is more common and uses direct system calls because it has few restrictions. This article describes how an application accesses data through fopen() and the standard C library and shows how you can improve the performance of that application by increasing the efficiency of I/O.

Using the C Library

C library functions that read or write data to a file opened through fopen(3), such as fread(3), fwrite(3), or fprint(3), or fprintf(3), make a request to the operating system in a size defined usually in the include file /usr/include/stdio.h. The table below lists the default request sizes for various operating systems. The request size from the fread(3)/fwrite(3)/fprint(3) is set by the size of the buffer being passed.

In the C library I/O environment, however, this request size does not affect the way data is actually read from or written to the disk. fopen(3), which uses the C library package (libc.a), moves the data from the user data space to a library buffer for each opened file. The size of this library buffer is by default:

OS Size in Bytes

Linux 8192

Solaris 8192

AIX 4096

SGI 4096

By default, all I/O requests to the system will be in the sizes listed in the above table, no matter what the request size you make from fread(3), fwrite(3), or fprint(3) call. This scheme is often called buffered I/O. The library functions read and write their requests to the library buffer, and data is read or written to the system in blocks corresponding to the library buffer size.

You can change the library buffer size with a call to the function setvbuf(3). Making the library buffer bigger improves performance for sequential I/O (although making the buffer size over 128 KB does not always work because some operating systems and configurations do not support larger I/O requests).

One of the important things that using the C library buffer provides is that all I/O is completed in 512 byte boundaries. 512 bytes is the base block size for most UNIX systems and is generally the smallest physical amount of data that can be written to a disk, as it is the hardware sector size.

As you will see in future columns, the amount of data moved does not depend on the request size from the application or the library buffer, but depends on the file system.

Why This Matters

All of these issues are important because they affect the system CPU overhead and the performance of the underlying I/O system. During the past 22 years that I have been in the business, the system CPU system performance has improved over 10000000 times, but I/O performance is severely lacking. Over that same time period, the fastest disk seek and latency time has changed from 24 milliseconds to 5.6 milliseconds (the fastest disks then and today), transfer rate per device has only increased at most 133 times (3 Mbytes/sec 1/2 duplex in 1981 to 200 Mbytes/sec full duplex in 2002). Without question, storage is the bottleneck for the foreseeable future as the performance increases in system CPU performance will far exceed the performance of disk and/or RAID storage.

Figure 2 shows device efficiency of I/O read and write requests if each I/O request is followed by an average seek and average latency. This is shown for both 10K and 15K RPM disk drives. In the case of C library-based I/O, the size of the "request" corresponds to the size of the library buffer. As is clearly shown the larger the request the more efficient use of the disks, but even at 512 KB request sizes the 15K RPM disk is still less than 50% efficient. Keep in mind that this is absolute worst-case scenario, but it is indicative of the problem.

Today disks have fairly large caches on each drive (4 MB is common and 16 MB is available, which is larger than the memory of the Cray-1A -- the first machine I worked on). Additionally, both RAID devices and disks use sort algorithms to reduce the number of head seeks. All of this increases the performance over this worst-case chart, but does not eliminate the real issues with the hardware. You still must seek, and you will miss disk revolutions.

The bottom line is, you must either make large I/O requests to efficiently use the hardware and/or use many disk drives to allow for many average seeks and average latencies.

Making Larger Requests

As mentioned previously, if you are using the C library calls you can change the size of your library buffer by using the setvbuf(3) function after the file is opened using the fopen(2) call, but before writing to the file.

The library buffer size should be exactly 8 bytes more than a multiple of 512 bytes. The additional 8 bytes are required for the hash table for each library buffer and, if not used, will significantly reduce your I/O performance and increase your system overhead (see Figure 1). For example, if you want to set your library buffer size to 64 KB, you would set it to:

	buffer_size=(64*1024)+8;

The following example shows how to use setvbuf(3):

#include <stdio.h>

main ()
{
   char    *buf;
   FILE    *fp;
   fp = fopen ("data.fil", "r");
   buf = malloc (262144);
   setvbuf (fp, buf, _IOFBF, 262144+8);
   fclose(fp);
}

As mentioned, if the 8 bytes are not added, it will cause poor performance. For each write request the system will have to read-modify-write. Read-modify-write happens when the requests are not on 512-byte boundaries. Each time data is written, the system will have to read the data into the system buffers, and then the system will write the data from the user space to the system buffers such that it is written on 512-byte boundaries, and then data will be written from the system buffers to the device. This will happen for each write. This process increases system overhead significantly and reduces I/O performance dramatically.

So, for programs using fopen/fread/fwrite, making larger I/O requests is easy if you have access to the source code. The real question is how to determine the library buffer size. My rule of thumb for sequential I/O is to make the library buffer size at least 4 times the size of the fread(3) or fwrite(3) request size. If you can afford the memory usage, make the library buffer size a much larger multiple in the range of 512 KB to 16 MB of the request size. Determining the correct size has a great deal to do with the rest of the I/O path including the operating system, file system, volume manager, and storage hardware. Determining the exact optimal value will have to wait a few months as we progress with this discussion, but making it large will immediately improve performance over using the default. Of course, this only works for files for which you are doing a great deal of I/O.

If you have an application that does random I/O, making the buffer larger than the I/O request will hurt performance because you will read data into the buffer that you will not use. The only time making the buffer larger than the request helps is when you can fit the whole file in the library buffer. Sometimes for older application where memory was at a premium files where used where now the data could be placed into memory.

Conclusion

The real issue is that you will need to match the application I/O efficiency with the amount of storage that you will need. If you have an application requirement of 90 MB/sec of reading and writing and the application makes 1K random I/O requests, the amount of hardware needed to support that requirement is much greater (likely 10x greater) than if the application was making 512-KB sequential I/O requests.

In the next few columns, I plan to address the whole I/O path and the issues with performance and tuning for the server hardware, operating system, file system, HBA, FC devices (such as tape and disk), and applications (such as databases).

Henry Newman has worked in the IT industry for more than 20 years. Originally at Cray Research and now with a consulting organization, he has provided expertise in systems architecture and performance analysis to customers in government, scientific research, and industry around the world. His focus is high-performance computing, storage and networking for UNIX systems, and he previously authored a monthly column about storage for Server/Workstation Expert magazine. He may be reached at: hsn@hsnewman.com.