RAM
RAID: Improving Web Access
Bo Adler
Using a file cache, such as Squid, is a familiar strategy to increase
the throughput to a Web site. It eliminates the overhead of disk
access by keeping static HTML content in memory, but fails to address
the issue of CGI scripts that need to write data back to disk. Given
that written data is often small and infrequent (compared to reads),
the OS file-buffering strategy is sufficient to accommodate this
load. However, in the case where written data becomes a serious
issue, the traditional solution is to implement some form of RAID
array to increase the bandwidth of disk access.
The development group at a company I work with constructed a Web
application that stored files within thousands of directories. Their
analysis of the application performance led them to conclude that
writes were happening one tenth of the time, and the nature of the
accesses were such that they were routinely getting cache misses
from the file cache for both reads and writes. The ultimate solution
would be a way to lock the important directory tree into the file
cache so that it would never experience a cache miss.
I explored using the new ramdisk implementation, which is available
in Linux 2.4 kernels, and frequently mirroring the data to disk.
This turned out to be impractical because it took rsync (and similar
tree-walking programs) several hours to run through the 50,000 directories,
jeopardizing the data if a reboot were to occur. All told, the filesystem
containing these directories could fit into 900 MB, but the sheer
size of the directory listing was more than most programs could
handle. Clearly, any mirroring solution would have to operate a
disk block at a time, rather than try to traverse the directory
tree.
There are several RAID levels that perform mirroring and redundancy
at the disk block level. One of the simplest of these is RAID-1
with two disk drives. RAID-1 is a straight mirroring approach --
the array is treated as a single "storage device", and
any data written to the RAID array is written to both disk drives.
This slows down the writing throughput to the RAID array because
twice as much data is being written and accounted for. The benefit
of RAID-1 is that the two drives can be independently fetching data
for separate requests, thereby increasing the throughput for read
operations over that of a single drive. It occurred to me that if
one of the disk drives in a RAID-1 array could be replaced with
a ramdisk, perhaps the speed-up would be comparable to that of just
a standalone ramdisk, while still retaining a physical copy of the
data in case of reboots.
I set out to test this theory with a series of tests using a Pentium
II 390 MHz running a stock Red Hat 7.1 installation. I chose to
use two benchmarking programs: dt, which can measure the raw sequential
block access to a device, and bonnie++, which measures filesystem
accesses. Typically, the benchmarking process requires that data
sizes to be about four times the size of physical memory, to minimize
the effect of the various caches. Since one of my "drives"
would be a ramdisk, this wasn't possible because the whole
test had to fit inside of the ramdisk. After some tests of various
configurations, I determined that the best way to minimize any kernel
caches and buffers would be to allocate as much memory as possible
to the ramdisk. This would leave very little memory for the kernel
(especially the file cache), and allow me to use fairly small data
sets for testing.
Creating the RAID-1 Array
By default, the Linux kernel is compiled to create 16-MB ramdisk
buffers (named /dev/ram0, /dev/ram1, etc.). To increase
the size of the buffer, it is necessary to pass an additional argument
to the kernel at boot time. Because my machine had 128 MB of RAM,
I created a 100-MB ramdisk buffer by editing the image section of
my /etc/lilo.conf file to read as shown in Listing 1. I rebooted
after running /sbin/lilo, but a check via /usr/bin/free
revealed that the ramdisk hadn't actually been allocated yet,
so I allocated it with:
bash# dd if=/dev/zero of=/dev/ram bs=1024k count=100
100+0 records in
100+0 records out
bash# free -t
Free memory:
total used free shared buffer cached
Mem: 126648 124708 1940 0 102748 5072
-/+ buffers/cache: 16888 109760
Swap: 102776 0 102776
Total: 229424 124708 104716
Creating a software RAID array is straightforward as well, given the
guidance in the Software RAID-HOWTO. My /etc/raidtab file came
straight from the howto, with minor modifications (Listing 2). The
RAID-1 array can then be created via:
bash# mkraid /dev/md0
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/sda1, 48163kB, raid superblock at 48064kB
disk 1: /dev/ram, 51200kB, raid superblock at 51136kB
This configuration is all that's necessary for simple testing.
In a production environment, the /etc/raidtab file must be
modified to mark the /dev/ram device as a failed disk, so that
the kernel does not try to use it at boot time. The ramdisk can be
added back into the array by using dd to allocate the ramdisk
buffer, and then using raidhotadd to initiate a reconstruction
of /dev/ram based on the data from the hard disk.
Caveats
The "reconstruction" from hard drive to ramdisk was
very slow, proceeding at only 100-KB/sec. I did not find the same
to be true for reconstruction from one disk to another -- disk-to-disk
recovery proceeded at approximately the bandwidth supported by the
hard drives. It's been suggested to me that this reconstruction
time could be evaded by not using a persistent superblock. In that
case, you could just dd the partition from the hard drive
to the ramdisk.
More important than a lack of speed, I found the RAID array to
be very touchy during reconstruction. The Software RAID-HOWTO says
that the RAID array is available for use right away, even during
reconstruction. I found that if the array was used sparingly, this
was true, but any significant usage caused lockups in the system.
I was able to consistently cause a lockup by using dt to
write to the whole array, if I did so before reconstruction was
complete. Furthermore, at least once during my testing I ended up
in a state where a benchmark of a standalone SCSI drive was producing
values half as large as normal; even removing the RAID modules from
memory did not correct the problem. Thus, I recommend caution when
employing the software RAID modules for valuable data.
According to the kernel sources, the memory allocated to a ramdisk
can be deleted by sending a BLKFLSBUF ioctl to the
appropriate /dev/ram device. Be warned that programs like
/usr/bin/free won't show the memory as available until
it is actually needed by another program.
Benchmarking
When I reached the point where testing could be performed, I ran
a series of tests similar to the following:
bash# ./dt of=/dev/md0 bs=8k limit=30m
[...]
bash# mkdir /mnt/test
bash# mkfs -t ext2 /dev/md0
[...]
bash# mount /dev/md0 /mnt/test
bash# ./bonnie++ -d /mnt/test -s 30 -n 9:9000:10:999 -r 0 -u root
[...]
I chose four configurations, which I thought would offer a suitable
comparison between options: standalone ramdisk, standalone SCSI disk,
RAID-1 array including both a SCSI disk and a ramdisk, and a RAID-1
array using two identical SCSI disks.
When analyzing Figure 1, we see that the performance of various
configurations is generally as expected. The ramdisk takes the lead,
in both writing and reading tests. Since a RAID-1 array has the
overhead of having to write data to both devices, it makes sense
that a single SCSI drive would perform faster than both RAID arrays
(which included the same drive). A RAID array with two disks should
naturally be slower than a RAID array made up of one disk and one
ramdisk so that makes sense as well.
The read test shows a different result, in that a RAID array made
up of a SCSI disk and a ramdisk outperformed a standalone SCSI disk.
Again, this makes sense when you consider that a RAID-1 implementation
will sometimes read from the hard drive and sometimes read from
the ramdisk. Each time the RAID implementation chooses the ramdisk
to answer a request, a speed-up over a regular disk is realized.
(The Linux implementation of RAID-1 tries to balance requests between
the devices in the array, so that no single device receives too
many requests in a row.)
The only real surprise in the results of the dt test is
how little improvement is imparted by using the ramdisk as part
of a RAID array. It should be noted, for both the dt and bonnie++
tests, that a benchmark that is non-threaded (and thus not generating
simultaneous requests) is not the best showing for a RAID array.
While a hard drive can only answer requests sequentially, a RAID
array can parallelize requests by distributing them over multiple
devices. In a non-threaded program, requests are always issued sequentially
and thus never exercise this advantage of RAID arrays. (See the
References section at the end of this article for more information
on the dt and bonnie++ test applications.)
Despite the limitations of sequential benchmarks, the bonnie++
tests in Figure 2 indicate a nice showing for the RAID array that
included the ramdisk. Only the standalone ramdisk performed better
on each test. (Please note the use of a logarithmic scale on the
bonnie++ graphs.) Performing the same benchmarks with memory available
for kernel buffers produced the results found in Figure 3. I include
it here, because I noticed a change in the relative performance
of the various configurations. The RAID array that included the
ramdisk saw improvement in only a few of the tests, but the standalone
SCSI disk saw several significant improvements, to the point where
it outperformed the two RAID configurations.
The superior performance of a standalone SCSI disk in the presence
of plentiful RAM would seem to shoot a hole into the technique of
using a ramdisk as part of a RAID-1 array -- as any serious
enterprise situation would be sure to have lots of RAM. A potential
explanation I had was the smaller RAM available to the kernel buffering
(because 50 MB was used for the ramdisk), but the graph shows that
even a normal RAID array made up of two hard drives performed relatively
poorly compared to that of a single hard drive. This leads me to
guess that perhaps the CPU speed is somehow a limiting factor, but
I don't know why this should become relevant in the face of
kernel buffering.
To clear up this question, I ran some additional tests on a dual
processor Pentium III 700-MHz installation of Red Hat 7.1, with
512 MB of memory. Under these conditions, many of the tests had
immeasurably high results (shown as 100,000 values in Figure 4),
but it is worth noting that the RAM RAID configuration performed
as well as or better than a standalone SCSI disk, thus dispelling
my fears.
Conclusion
These tests show the viability of incorporating a ramdisk into
a RAID-1 array. Under benchmarking conditions, there is a measurable
advantage over a standalone disk or a software implementation of
a RAID-1 array of two disks. While not as fast as a pure ramdisk,
such a configuration confers the property of data persistence across
reboots without the troublesome (and sometimes impractical) problem
of running a data-synchronizing process. Also encountered were two
issues that merit further investigation: the slow rate of reconstruction
to the ramdisk, and the crash situation when writing to the RAID
during reconstruction.
References
Software RAID-HOWTO: http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html
Data Test (dt) program: http://www.bit-net.com/~rmiller/dt.html
bonnie++ program: http://www.coker.com.au/bonnie++/
Bo Adler is a freelance consultant specializing in network
programming and security. He can be contacted at: thumper@alumni.caltech.edu.
|