Lib C and I/O and Performance
I will be writing a new column for Sys Admin magazine on
storage and I/O related topics. To start, I will discuss how I/O
works from the application. Applications generally make requests
to create files in at least two ways via POSIX standards. One way
is that the file is created/opened with the open(2) system
call, and I/O is done directly to the system via calls to the raw
device, volume manager, and file system (read(2), write(2),
pread(2), pwrite(2), or the POSIX standard asynchronous
I/O routines aio_read(3RT), aio_write(3RT), lio_
listio(3RT)). With the other method, the file is created/opened
with fopen(3), and I/O is performed via the standard C library
The fopen(3) method is more common and uses direct system
calls because it has few restrictions. This article describes how
an application accesses data through fopen() and the standard
C library and shows how you can improve the performance of that
application by increasing the efficiency of I/O.
Using the C Library
C library functions that read or write data to a file opened through
fopen(3), such as fread(3), fwrite(3), or fprint(3),
or fprintf(3), make a request to the operating system in
a size defined usually in the include file /usr/include/stdio.h.
The table below lists the default request sizes for various operating
systems. The request size from the fread(3)/fwrite(3)/fprint(3)
is set by the size of the buffer being passed.
In the C library I/O environment, however, this request size does
not affect the way data is actually read from or written to the
disk. fopen(3), which uses the C library package (libc.a),
moves the data from the user data space to a library buffer for
each opened file. The size of this library buffer is by default:
OS Size in Bytes
By default, all I/O requests to the system will be in the sizes
listed in the above table, no matter what the request size you make
from fread(3), fwrite(3), or fprint(3) call.
This scheme is often called buffered I/O. The library functions
read and write their requests to the library buffer, and data is
read or written to the system in blocks corresponding to the library
You can change the library buffer size with a call to the function
setvbuf(3). Making the library buffer bigger improves performance
for sequential I/O (although making the buffer size over 128 KB
does not always work because some operating systems and configurations
do not support larger I/O requests).
One of the important things that using the C library buffer provides
is that all I/O is completed in 512 byte boundaries. 512 bytes is
the base block size for most UNIX systems and is generally the smallest
physical amount of data that can be written to a disk, as it is
the hardware sector size.
As you will see in future columns, the amount of data moved does
not depend on the request size from the application or the library
buffer, but depends on the file system.
Why This Matters
All of these issues are important because they affect the system
CPU overhead and the performance of the underlying I/O system. During
the past 22 years that I have been in the business, the system CPU
system performance has improved over 10000000 times, but I/O performance
is severely lacking. Over that same time period, the fastest disk
seek and latency time has changed from 24 milliseconds to 5.6 milliseconds
(the fastest disks then and today), transfer rate per device has
only increased at most 133 times (3 Mbytes/sec 1/2 duplex in 1981
to 200 Mbytes/sec full duplex in 2002). Without question, storage
is the bottleneck for the foreseeable future as the performance
increases in system CPU performance will far exceed the performance
of disk and/or RAID storage.
Figure 2 shows device efficiency of I/O read and write requests
if each I/O request is followed by an average seek and average latency.
This is shown for both 10K and 15K RPM disk drives. In the case
of C library-based I/O, the size of the "request" corresponds
to the size of the library buffer. As is clearly shown the larger
the request the more efficient use of the disks, but even at 512
KB request sizes the 15K RPM disk is still less than 50% efficient.
Keep in mind that this is absolute worst-case scenario, but it is
indicative of the problem.
Today disks have fairly large caches on each drive (4 MB is common
and 16 MB is available, which is larger than the memory of the Cray-1A
-- the first machine I worked on). Additionally, both RAID devices
and disks use sort algorithms to reduce the number of head seeks.
All of this increases the performance over this worst-case chart,
but does not eliminate the real issues with the hardware. You still
must seek, and you will miss disk revolutions.
The bottom line is, you must either make large I/O requests to
efficiently use the hardware and/or use many disk drives to allow
for many average seeks and average latencies.
Making Larger Requests
As mentioned previously, if you are using the C library calls
you can change the size of your library buffer by using the setvbuf(3)
function after the file is opened using the fopen(2) call,
but before writing to the file.
The library buffer size should be exactly 8 bytes more than a
multiple of 512 bytes. The additional 8 bytes are required for the
hash table for each library buffer and, if not used, will significantly
reduce your I/O performance and increase your system overhead (see
Figure 1). For example, if you want to set your library buffer size
to 64 KB, you would set it to:
The following example shows how to use setvbuf(3):
fp = fopen ("data.fil", "r");
buf = malloc (262144);
setvbuf (fp, buf, _IOFBF, 262144+8);
As mentioned, if the 8 bytes are not added, it will cause poor performance.
For each write request the system will have to read-modify-write.
Read-modify-write happens when the requests are not on 512-byte boundaries.
Each time data is written, the system will have to read the data into
the system buffers, and then the system will write the data from the
user space to the system buffers such that it is written on 512-byte
boundaries, and then data will be written from the system buffers
to the device. This will happen for each write. This process increases
system overhead significantly and reduces I/O performance dramatically.
So, for programs using fopen/fread/fwrite,
making larger I/O requests is easy if you have access to the source
code. The real question is how to determine the library buffer size.
My rule of thumb for sequential I/O is to make the library buffer
size at least 4 times the size of the fread(3) or fwrite(3)
request size. If you can afford the memory usage, make the library
buffer size a much larger multiple in the range of 512 KB to 16
MB of the request size. Determining the correct size has a great
deal to do with the rest of the I/O path including the operating
system, file system, volume manager, and storage hardware. Determining
the exact optimal value will have to wait a few months as we progress
with this discussion, but making it large will immediately improve
performance over using the default. Of course, this only works for
files for which you are doing a great deal of I/O.
If you have an application that does random I/O, making the buffer
larger than the I/O request will hurt performance because you will
read data into the buffer that you will not use. The only time making
the buffer larger than the request helps is when you can fit the
whole file in the library buffer. Sometimes for older application
where memory was at a premium files where used where now the data
could be placed into memory.
The real issue is that you will need to match the application
I/O efficiency with the amount of storage that you will need. If
you have an application requirement of 90 MB/sec of reading and
writing and the application makes 1K random I/O requests, the amount
of hardware needed to support that requirement is much greater (likely
10x greater) than if the application was making 512-KB sequential
In the next few columns, I plan to address the whole I/O path
and the issues with performance and tuning for the server hardware,
operating system, file system, HBA, FC devices (such as tape and
disk), and applications (such as databases).
Henry Newman has worked in the IT industry for more than 20
years. Originally at Cray Research and now with a consulting organization,
he has provided expertise in systems architecture and performance
analysis to customers in government, scientific research, and industry
around the world. His focus is high-performance computing, storage
and networking for UNIX systems, and he previously authored a monthly
column about storage for Server/Workstation Expert magazine.
He may be reached at: firstname.lastname@example.org.