Cover V11, I06

Article
Figure 1
Figure 2
Figure 3
Figure 4
Listing 1
Listing 2

jun2002.tar

RAM RAID: Improving Web Access

Bo Adler

Using a file cache, such as Squid, is a familiar strategy to increase the throughput to a Web site. It eliminates the overhead of disk access by keeping static HTML content in memory, but fails to address the issue of CGI scripts that need to write data back to disk. Given that written data is often small and infrequent (compared to reads), the OS file-buffering strategy is sufficient to accommodate this load. However, in the case where written data becomes a serious issue, the traditional solution is to implement some form of RAID array to increase the bandwidth of disk access.

The development group at a company I work with constructed a Web application that stored files within thousands of directories. Their analysis of the application performance led them to conclude that writes were happening one tenth of the time, and the nature of the accesses were such that they were routinely getting cache misses from the file cache for both reads and writes. The ultimate solution would be a way to lock the important directory tree into the file cache so that it would never experience a cache miss.

I explored using the new ramdisk implementation, which is available in Linux 2.4 kernels, and frequently mirroring the data to disk. This turned out to be impractical because it took rsync (and similar tree-walking programs) several hours to run through the 50,000 directories, jeopardizing the data if a reboot were to occur. All told, the filesystem containing these directories could fit into 900 MB, but the sheer size of the directory listing was more than most programs could handle. Clearly, any mirroring solution would have to operate a disk block at a time, rather than try to traverse the directory tree.

There are several RAID levels that perform mirroring and redundancy at the disk block level. One of the simplest of these is RAID-1 with two disk drives. RAID-1 is a straight mirroring approach -- the array is treated as a single "storage device", and any data written to the RAID array is written to both disk drives. This slows down the writing throughput to the RAID array because twice as much data is being written and accounted for. The benefit of RAID-1 is that the two drives can be independently fetching data for separate requests, thereby increasing the throughput for read operations over that of a single drive. It occurred to me that if one of the disk drives in a RAID-1 array could be replaced with a ramdisk, perhaps the speed-up would be comparable to that of just a standalone ramdisk, while still retaining a physical copy of the data in case of reboots.

I set out to test this theory with a series of tests using a Pentium II 390 MHz running a stock Red Hat 7.1 installation. I chose to use two benchmarking programs: dt, which can measure the raw sequential block access to a device, and bonnie++, which measures filesystem accesses. Typically, the benchmarking process requires that data sizes to be about four times the size of physical memory, to minimize the effect of the various caches. Since one of my "drives" would be a ramdisk, this wasn't possible because the whole test had to fit inside of the ramdisk. After some tests of various configurations, I determined that the best way to minimize any kernel caches and buffers would be to allocate as much memory as possible to the ramdisk. This would leave very little memory for the kernel (especially the file cache), and allow me to use fairly small data sets for testing.

Creating the RAID-1 Array

By default, the Linux kernel is compiled to create 16-MB ramdisk buffers (named /dev/ram0, /dev/ram1, etc.). To increase the size of the buffer, it is necessary to pass an additional argument to the kernel at boot time. Because my machine had 128 MB of RAM, I created a 100-MB ramdisk buffer by editing the image section of my /etc/lilo.conf file to read as shown in Listing 1. I rebooted after running /sbin/lilo, but a check via /usr/bin/free revealed that the ramdisk hadn't actually been allocated yet, so I allocated it with:

bash# dd if=/dev/zero of=/dev/ram bs=1024k count=100
100+0 records in
100+0 records out
bash# free -t
Free memory:
             total     used     free      shared  buffer   cached
Mem:        126648   124708     1940        0     102748     5072
-/+ buffers/cache:    16888     109760
Swap:       102776        0     102776
Total:      229424   124708     104716
Creating a software RAID array is straightforward as well, given the guidance in the Software RAID-HOWTO. My /etc/raidtab file came straight from the howto, with minor modifications (Listing 2). The RAID-1 array can then be created via:

bash# mkraid /dev/md0
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/sda1, 48163kB, raid superblock at 48064kB
disk 1: /dev/ram, 51200kB, raid superblock at 51136kB
This configuration is all that's necessary for simple testing. In a production environment, the /etc/raidtab file must be modified to mark the /dev/ram device as a failed disk, so that the kernel does not try to use it at boot time. The ramdisk can be added back into the array by using dd to allocate the ramdisk buffer, and then using raidhotadd to initiate a reconstruction of /dev/ram based on the data from the hard disk.

Caveats

The "reconstruction" from hard drive to ramdisk was very slow, proceeding at only 100-KB/sec. I did not find the same to be true for reconstruction from one disk to another -- disk-to-disk recovery proceeded at approximately the bandwidth supported by the hard drives. It's been suggested to me that this reconstruction time could be evaded by not using a persistent superblock. In that case, you could just dd the partition from the hard drive to the ramdisk.

More important than a lack of speed, I found the RAID array to be very touchy during reconstruction. The Software RAID-HOWTO says that the RAID array is available for use right away, even during reconstruction. I found that if the array was used sparingly, this was true, but any significant usage caused lockups in the system. I was able to consistently cause a lockup by using dt to write to the whole array, if I did so before reconstruction was complete. Furthermore, at least once during my testing I ended up in a state where a benchmark of a standalone SCSI drive was producing values half as large as normal; even removing the RAID modules from memory did not correct the problem. Thus, I recommend caution when employing the software RAID modules for valuable data.

According to the kernel sources, the memory allocated to a ramdisk can be deleted by sending a BLKFLSBUF ioctl to the appropriate /dev/ram device. Be warned that programs like /usr/bin/free won't show the memory as available until it is actually needed by another program.

Benchmarking

When I reached the point where testing could be performed, I ran a series of tests similar to the following:

bash# ./dt of=/dev/md0 bs=8k limit=30m
[...]
bash# mkdir /mnt/test
bash# mkfs -t ext2 /dev/md0
[...]
bash# mount /dev/md0 /mnt/test
bash# ./bonnie++ -d /mnt/test -s 30 -n 9:9000:10:999 -r 0 -u root
[...]
I chose four configurations, which I thought would offer a suitable comparison between options: standalone ramdisk, standalone SCSI disk, RAID-1 array including both a SCSI disk and a ramdisk, and a RAID-1 array using two identical SCSI disks.

When analyzing Figure 1, we see that the performance of various configurations is generally as expected. The ramdisk takes the lead, in both writing and reading tests. Since a RAID-1 array has the overhead of having to write data to both devices, it makes sense that a single SCSI drive would perform faster than both RAID arrays (which included the same drive). A RAID array with two disks should naturally be slower than a RAID array made up of one disk and one ramdisk so that makes sense as well.

The read test shows a different result, in that a RAID array made up of a SCSI disk and a ramdisk outperformed a standalone SCSI disk. Again, this makes sense when you consider that a RAID-1 implementation will sometimes read from the hard drive and sometimes read from the ramdisk. Each time the RAID implementation chooses the ramdisk to answer a request, a speed-up over a regular disk is realized. (The Linux implementation of RAID-1 tries to balance requests between the devices in the array, so that no single device receives too many requests in a row.)

The only real surprise in the results of the dt test is how little improvement is imparted by using the ramdisk as part of a RAID array. It should be noted, for both the dt and bonnie++ tests, that a benchmark that is non-threaded (and thus not generating simultaneous requests) is not the best showing for a RAID array. While a hard drive can only answer requests sequentially, a RAID array can parallelize requests by distributing them over multiple devices. In a non-threaded program, requests are always issued sequentially and thus never exercise this advantage of RAID arrays. (See the References section at the end of this article for more information on the dt and bonnie++ test applications.)

Despite the limitations of sequential benchmarks, the bonnie++ tests in Figure 2 indicate a nice showing for the RAID array that included the ramdisk. Only the standalone ramdisk performed better on each test. (Please note the use of a logarithmic scale on the bonnie++ graphs.) Performing the same benchmarks with memory available for kernel buffers produced the results found in Figure 3. I include it here, because I noticed a change in the relative performance of the various configurations. The RAID array that included the ramdisk saw improvement in only a few of the tests, but the standalone SCSI disk saw several significant improvements, to the point where it outperformed the two RAID configurations.

The superior performance of a standalone SCSI disk in the presence of plentiful RAM would seem to shoot a hole into the technique of using a ramdisk as part of a RAID-1 array -- as any serious enterprise situation would be sure to have lots of RAM. A potential explanation I had was the smaller RAM available to the kernel buffering (because 50 MB was used for the ramdisk), but the graph shows that even a normal RAID array made up of two hard drives performed relatively poorly compared to that of a single hard drive. This leads me to guess that perhaps the CPU speed is somehow a limiting factor, but I don't know why this should become relevant in the face of kernel buffering.

To clear up this question, I ran some additional tests on a dual processor Pentium III 700-MHz installation of Red Hat 7.1, with 512 MB of memory. Under these conditions, many of the tests had immeasurably high results (shown as 100,000 values in Figure 4), but it is worth noting that the RAM RAID configuration performed as well as or better than a standalone SCSI disk, thus dispelling my fears.

Conclusion

These tests show the viability of incorporating a ramdisk into a RAID-1 array. Under benchmarking conditions, there is a measurable advantage over a standalone disk or a software implementation of a RAID-1 array of two disks. While not as fast as a pure ramdisk, such a configuration confers the property of data persistence across reboots without the troublesome (and sometimes impractical) problem of running a data-synchronizing process. Also encountered were two issues that merit further investigation: the slow rate of reconstruction to the ramdisk, and the crash situation when writing to the RAID during reconstruction.

References

Software RAID-HOWTO: http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html

Data Test (dt) program: http://www.bit-net.com/~rmiller/dt.html

bonnie++ program: http://www.coker.com.au/bonnie++/

Bo Adler is a freelance consultant specializing in network programming and security. He can be contacted at: thumper@alumni.caltech.edu.