Cover V11, I04

Article
Sidebar

apr2002.tar

CD Backups with Easy File Access

Bryan Smith

CD drives are common components, which makes CD media ideal for both self-backup/restores done by users and point backups by systems administrators. But what is the optimal CD backup approach? In this article, I will introduce the back2cd script, which maximizes CD capacity while preserving CD's inherent random-access advantage.

CD Recorder Optional

Creating a CD of files does not require a CD recorder on the same system because CD backup is a two-step process. The first and most involved step is mastering (i.e., creating) the CD image. A CD image is then used in the second step, recording (i.e., burning) to CD media. The back2cd script (Listing 1) handles the first step (mastering) irrespective of a CD recorder on the system. (All Sys Admin magazine listings are available at: http://www.sysadminmag.com/code/.)

Mastering files into a single CD image file is analogous to archiving files into a single tar, cpio, or zip file. Archiving stuffs files and their attributes (meta-data) into a single file (archive). As with tar or cpio, there is a defined standard format for CD images. The ISO9660 CD standard defines a data track format known as "Yellow Book", commonly represented as a CD image file with a .iso extension. back2cd is built around the mkisofs (make ISO filesystem) program, included or available for most UNIX systems (and standard in Cygwin).

Recording a CD image is then analogous to un-archiving. The CD recorder software un-archives the files from the single .iso file onto CD media. Thus, the resulting CD is a replica of the files, their hierarchy, and their meta-data (assuming they are preserved during the mastering process). back2cd uses mkisofs to create a .iso file that can be moved to a system with a CD recorder. A .iso file can be recorded by the majority of available CD software (e.g., Adaptec, Nero on Windows, and cdrecord on UNIX).

Issues with Compression

Unfortunately, the CD standard does not define a compressed format. Simply compressing the .iso file like a tar archive is not an option because the recording software would no longer recognize it. To increase the storage capacity via compression, files inside the image must be compressed. This is where the capabilities of mkisofs end and CD backup scripts built around it come into play.

The most direct way is to first use tar to archive and compress the files and then master a CD with that single file. This "double archiving" method is not ideal for several reasons. The CD's advantage of quick, random access is lost. The CD recording process should un-archive our backup onto CD, not simply plop one big file onto a CD (such a file would require further processing before we could start browsing it). Also, most archivers (e.g., tar) compress the entire archive instead of each file in the archive (with the exception of the afio replacement for cpio). An error at one point of a compressed archive can render the rest of the archive unrecoverable.

The less direct but better method is to individually compress each file. The hierarchy is still preserved for easy browsing, and files are directly accessible (except for decompression). If a scratch or other physical damage corrupts one or more files, no other files are affected (unlike most "double archiving" approaches). This is how compression is implemented in back2cd.

The back2cd Approach

back2cd is a C shell (tcsh highly recommended) script that reads one or more directories, builds a temporary tree where the files are compressed on a per-file basis, and masters a .iso CD image. back2cd provides the following:

  • A directly accessible, hierarchical copy of the original disk tree on CD, except for the compression of the files.
  • Careful test and replication of special files, such as device nodes and symlinks.
  • "Real-time" LZO compression, which drastically improves performance over gzip and bzip2 (gzip/bzip2 usage optional). See the sidebar for a comparison of LZO, gzip, and bzip2.
  • Errors are isolated due to per-file compression.
  • Uses mkisofs (widely available for UNIX and included in Cygwin), with both Joliet (Windows) and RockRidge (UNIX/Mac) extensions on by default (Apple HFS optional).
  • Option to enable md5 or other checksums for both original and compressed filelist using md5sum (defaults to producing plain filelists).
  • Option to not compress files based on file magic (using file) or extensions, which both add delays.
  • C shell (csh) implementation to address the wealth of legacy UNIX systems without a CD recorder. tcsh is required for many features such as handling pathnames with spaces.
Using back2cd

When using back2cd, you'll need to customize the documented "VARIABLES" section. The most important variables are idir and itmp, which each require approximately 700 MB free space. If regular users are to run this script, make sure these directories are world-writable. Check the other VARIABLES for compatibility with your OS.

Once customized, back2cd is simple to run and requires only two parameters:

back2cd <label> <dir> [<dir>...]
  <label>  CD label/filename (withOUT .iso extension)
  <dir>    one or more relative/absolute directory path(s)
For example, user "bob" can back up his home directory. From Bob's home directory:

back2cd bobs_cd .
Conclusion

back2cd is a cross-platform script for both end-user and point backups to CD, especially legacy platforms that lack CD recorders. By compressing files individually, it backs up disk trees of 1-2 GB into a single .iso CD image file that can be moved to any system with a CD recorder. The resulting CD allows quick, easy, and painless browsing and restores of random files as well as complete trees. I have found it useful for letting users self-backup, taking snapshots, off-lining data, and distributing data externally.

Linux sys admins wanting to complement back2cd with a complete, CD-based disaster recovery solution should investigate Mondo Rescue. Mondo Rescue uses bash scripts for CD-spanned, afio-based archiving (mondo-archive), and automated boot disk/CD creation (mindi). During recovery, it uses a ncurses-based partitioning and restore utility (mondo-restore). Although the afio "double archiving" method does not allow direct browsing like back2cd (a non-issue when doing full restores), errors are usually isolated because afio does per-file compression inside its archives.

Resources

mkisofs/cdrecord ISO9660/.iso Mastering/Recording Programs -- http://www.fokus.gmd.de/research/ \ cc/glone/employees/joerg.schilling/private/cdrecord.html

gzip, bzip2, and lzop File Compression Programs -- http://www.gzip.org/ http://sources.redhat.com/ bzip2/ http://www.oberhumer.com/opensource/lzop/

tcsh (Enhanced version of the Berkeley C shell) -- http://www.tcsh.org

Cygwin (Cygnus GNU/UNIX Environment on Windows) -- http://sources.redhat.com/cygwin/

Bryan J. Smith has a BSCpE from the University of Central Florida. For the past decade, he has worked in a dual engineering/IT role at firms involved with civil, aerospace, and semiconductor design. His consulting firm, SmithConcepts, Inc., offers engineering-focused IT solutions. Outside of periodicals, Bryan was a contributing author on Samba Unleashed. He currently lives near Orlando with his wife Lourdes, and can be contacted at: b.j.smith@ieee.org.