|  Using 
              Lib C and I/O and Performance
 Henry Newman
              I will be writing a new column for Sys Admin magazine on 
              storage and I/O related topics. To start, I will discuss how I/O 
              works from the application. Applications generally make requests 
              to create files in at least two ways via POSIX standards. One way 
              is that the file is created/opened with the open(2) system 
              call, and I/O is done directly to the system via calls to the raw 
              device, volume manager, and file system (read(2), write(2), 
              pread(2), pwrite(2), or the POSIX standard asynchronous 
              I/O routines aio_read(3RT), aio_write(3RT), lio_ 
              listio(3RT)). With the other method, the file is created/opened 
              with fopen(3), and I/O is performed via the standard C library 
              package (fread/fwrite/fprint).
              The fopen(3) method is more common and uses direct system 
              calls because it has few restrictions. This article describes how 
              an application accesses data through fopen() and the standard 
              C library and shows how you can improve the performance of that 
              application by increasing the efficiency of I/O.
              Using the C Library
              C library functions that read or write data to a file opened through 
              fopen(3), such as fread(3), fwrite(3), or fprint(3), 
              or fprintf(3), make a request to the operating system in 
              a size defined usually in the include file /usr/include/stdio.h. 
              The table below lists the default request sizes for various operating 
              systems. The request size from the fread(3)/fwrite(3)/fprint(3) 
              is set by the size of the buffer being passed.
              In the C library I/O environment, however, this request size does 
              not affect the way data is actually read from or written to the 
              disk. fopen(3), which uses the C library package (libc.a), 
              moves the data from the user data space to a library buffer for 
              each opened file. The size of this library buffer is by default:
              
              OS Size in Bytes
              Linux 8192
              Solaris 8192
              AIX 4096
              SGI 4096
              
              By default, all I/O requests to the system will be in the sizes 
              listed in the above table, no matter what the request size you make 
              from fread(3), fwrite(3), or fprint(3) call. 
              This scheme is often called buffered I/O. The library functions 
              read and write their requests to the library buffer, and data is 
              read or written to the system in blocks corresponding to the library 
              buffer size.
              You can change the library buffer size with a call to the function 
              setvbuf(3). Making the library buffer bigger improves performance 
              for sequential I/O (although making the buffer size over 128 KB 
              does not always work because some operating systems and configurations 
              do not support larger I/O requests).
              One of the important things that using the C library buffer provides 
              is that all I/O is completed in 512 byte boundaries. 512 bytes is 
              the base block size for most UNIX systems and is generally the smallest 
              physical amount of data that can be written to a disk, as it is 
              the hardware sector size.
              As you will see in future columns, the amount of data moved does 
              not depend on the request size from the application or the library 
              buffer, but depends on the file system.
              Why This Matters
              All of these issues are important because they affect the system 
              CPU overhead and the performance of the underlying I/O system. During 
              the past 22 years that I have been in the business, the system CPU 
              system performance has improved over 10000000 times, but I/O performance 
              is severely lacking. Over that same time period, the fastest disk 
              seek and latency time has changed from 24 milliseconds to 5.6 milliseconds 
              (the fastest disks then and today), transfer rate per device has 
              only increased at most 133 times (3 Mbytes/sec 1/2 duplex in 1981 
              to 200 Mbytes/sec full duplex in 2002). Without question, storage 
              is the bottleneck for the foreseeable future as the performance 
              increases in system CPU performance will far exceed the performance 
              of disk and/or RAID storage.
              Figure 2 shows device efficiency of I/O read and write requests 
              if each I/O request is followed by an average seek and average latency. 
              This is shown for both 10K and 15K RPM disk drives. In the case 
              of C library-based I/O, the size of the "request" corresponds 
              to the size of the library buffer. As is clearly shown the larger 
              the request the more efficient use of the disks, but even at 512 
              KB request sizes the 15K RPM disk is still less than 50% efficient. 
              Keep in mind that this is absolute worst-case scenario, but it is 
              indicative of the problem.
              Today disks have fairly large caches on each drive (4 MB is common 
              and 16 MB is available, which is larger than the memory of the Cray-1A 
              -- the first machine I worked on). Additionally, both RAID devices 
              and disks use sort algorithms to reduce the number of head seeks. 
              All of this increases the performance over this worst-case chart, 
              but does not eliminate the real issues with the hardware. You still 
              must seek, and you will miss disk revolutions.
              The bottom line is, you must either make large I/O requests to 
              efficiently use the hardware and/or use many disk drives to allow 
              for many average seeks and average latencies.
              Making Larger Requests
              As mentioned previously, if you are using the C library calls 
              you can change the size of your library buffer by using the setvbuf(3) 
              function after the file is opened using the fopen(2) call, 
              but before writing to the file.
              The library buffer size should be exactly 8 bytes more than a 
              multiple of 512 bytes. The additional 8 bytes are required for the 
              hash table for each library buffer and, if not used, will significantly 
              reduce your I/O performance and increase your system overhead (see 
              Figure 1). For example, if you want to set your library buffer size 
              to 64 KB, you would set it to:
              
             
	buffer_size=(64*1024)+8;
The following example shows how to use setvbuf(3):  
             
#include <stdio.h>
main ()
{
   char    *buf;
   FILE    *fp;
   fp = fopen ("data.fil", "r");
   buf = malloc (262144);
   setvbuf (fp, buf, _IOFBF, 262144+8);
   fclose(fp);
}
As mentioned, if the 8 bytes are not added, it will cause poor performance. 
            For each write request the system will have to read-modify-write. 
            Read-modify-write happens when the requests are not on 512-byte boundaries. 
            Each time data is written, the system will have to read the data into 
            the system buffers, and then the system will write the data from the 
            user space to the system buffers such that it is written on 512-byte 
            boundaries, and then data will be written from the system buffers 
            to the device. This will happen for each write. This process increases 
            system overhead significantly and reduces I/O performance dramatically. So, for programs using fopen/fread/fwrite, 
              making larger I/O requests is easy if you have access to the source 
              code. The real question is how to determine the library buffer size. 
              My rule of thumb for sequential I/O is to make the library buffer 
              size at least 4 times the size of the fread(3) or fwrite(3) 
              request size. If you can afford the memory usage, make the library 
              buffer size a much larger multiple in the range of 512 KB to 16 
              MB of the request size. Determining the correct size has a great 
              deal to do with the rest of the I/O path including the operating 
              system, file system, volume manager, and storage hardware. Determining 
              the exact optimal value will have to wait a few months as we progress 
              with this discussion, but making it large will immediately improve 
              performance over using the default. Of course, this only works for 
              files for which you are doing a great deal of I/O.
              If you have an application that does random I/O, making the buffer 
              larger than the I/O request will hurt performance because you will 
              read data into the buffer that you will not use. The only time making 
              the buffer larger than the request helps is when you can fit the 
              whole file in the library buffer. Sometimes for older application 
              where memory was at a premium files where used where now the data 
              could be placed into memory.
              Conclusion
              The real issue is that you will need to match the application 
              I/O efficiency with the amount of storage that you will need. If 
              you have an application requirement of 90 MB/sec of reading and 
              writing and the application makes 1K random I/O requests, the amount 
              of hardware needed to support that requirement is much greater (likely 
              10x greater) than if the application was making 512-KB sequential 
              I/O requests.
              In the next few columns, I plan to address the whole I/O path 
              and the issues with performance and tuning for the server hardware, 
              operating system, file system, HBA, FC devices (such as tape and 
              disk), and applications (such as databases).
              Henry Newman has worked in the IT industry for more than 20 
              years. Originally at Cray Research and now with a consulting organization, 
              he has provided expertise in systems architecture and performance 
              analysis to customers in government, scientific research, and industry 
              around the world. His focus is high-performance computing, storage 
              and networking for UNIX systems, and he previously authored a monthly 
              column about storage for Server/Workstation Expert magazine. 
              He may be reached at: hsn@hsnewman.com.
           |