CD Backups
with Easy File Access
Bryan Smith
CD drives are common components, which makes CD media ideal for
both self-backup/restores done by users and point backups by systems
administrators. But what is the optimal CD backup approach? In this
article, I will introduce the back2cd script, which maximizes
CD capacity while preserving CD's inherent random-access advantage.
CD Recorder Optional
Creating a CD of files does not require a CD recorder on the same
system because CD backup is a two-step process. The first and most
involved step is mastering (i.e., creating) the CD image. A CD image
is then used in the second step, recording (i.e., burning) to CD
media. The back2cd script (Listing 1) handles the first step
(mastering) irrespective of a CD recorder on the system. (All Sys
Admin magazine listings are available at: http://www.sysadminmag.com/code/.)
Mastering files into a single CD image file is analogous to archiving
files into a single tar, cpio, or zip file.
Archiving stuffs files and their attributes (meta-data) into a single
file (archive). As with tar or cpio, there is a defined
standard format for CD images. The ISO9660 CD standard defines a
data track format known as "Yellow Book", commonly represented
as a CD image file with a .iso extension. back2cd
is built around the mkisofs (make ISO filesystem) program,
included or available for most UNIX systems (and standard in Cygwin).
Recording a CD image is then analogous to un-archiving. The CD
recorder software un-archives the files from the single .iso
file onto CD media. Thus, the resulting CD is a replica of the files,
their hierarchy, and their meta-data (assuming they are preserved
during the mastering process). back2cd uses mkisofs
to create a .iso file that can be moved to a system with
a CD recorder. A .iso file can be recorded by the majority
of available CD software (e.g., Adaptec, Nero on Windows, and cdrecord
on UNIX).
Issues with Compression
Unfortunately, the CD standard does not define a compressed format.
Simply compressing the .iso file like a tar archive
is not an option because the recording software would no longer
recognize it. To increase the storage capacity via compression,
files inside the image must be compressed. This is where the capabilities
of mkisofs end and CD backup scripts built around it come
into play.
The most direct way is to first use tar to archive and
compress the files and then master a CD with that single file. This
"double archiving" method is not ideal for several reasons.
The CD's advantage of quick, random access is lost. The CD
recording process should un-archive our backup onto CD, not simply
plop one big file onto a CD (such a file would require further processing
before we could start browsing it). Also, most archivers (e.g.,
tar) compress the entire archive instead of each file in
the archive (with the exception of the afio replacement for
cpio). An error at one point of a compressed archive can
render the rest of the archive unrecoverable.
The less direct but better method is to individually compress
each file. The hierarchy is still preserved for easy browsing, and
files are directly accessible (except for decompression). If a scratch
or other physical damage corrupts one or more files, no other files
are affected (unlike most "double archiving" approaches).
This is how compression is implemented in back2cd.
The back2cd Approach
back2cd is a C shell (tcsh highly recommended) script
that reads one or more directories, builds a temporary tree where
the files are compressed on a per-file basis, and masters a .iso
CD image. back2cd provides the following:
- A directly accessible, hierarchical copy of the original disk
tree on CD, except for the compression of the files.
- Careful test and replication of special files, such as device
nodes and symlinks.
- "Real-time" LZO compression, which drastically improves
performance over gzip and bzip2 (gzip/bzip2
usage optional). See the sidebar for a comparison of LZO, gzip,
and bzip2.
- Errors are isolated due to per-file compression.
- Uses mkisofs (widely available for UNIX and included
in Cygwin), with both Joliet (Windows) and RockRidge (UNIX/Mac)
extensions on by default (Apple HFS optional).
- Option to enable md5 or other checksums for both original and
compressed filelist using md5sum (defaults to producing
plain filelists).
- Option to not compress files based on file magic (using file)
or extensions, which both add delays.
- C shell (csh) implementation to address the wealth of
legacy UNIX systems without a CD recorder. tcsh is required
for many features such as handling pathnames with spaces.
Using back2cd
When using back2cd, you'll need to customize the documented
"VARIABLES" section. The most important variables are
idir and itmp, which each require approximately 700
MB free space. If regular users are to run this script, make sure
these directories are world-writable. Check the other VARIABLES
for compatibility with your OS.
Once customized, back2cd is simple to run and requires
only two parameters:
back2cd <label> <dir> [<dir>...]
<label> CD label/filename (withOUT .iso extension)
<dir> one or more relative/absolute directory path(s)
For example, user "bob" can back up his home directory.
From Bob's home directory:
back2cd bobs_cd .
Conclusion
back2cd is a cross-platform script for both end-user and
point backups to CD, especially legacy platforms that lack CD recorders.
By compressing files individually, it backs up disk trees of 1-2
GB into a single .iso CD image file that can be moved to
any system with a CD recorder. The resulting CD allows quick, easy,
and painless browsing and restores of random files as well as complete
trees. I have found it useful for letting users self-backup, taking
snapshots, off-lining data, and distributing data externally.
Linux sys admins wanting to complement back2cd with a complete,
CD-based disaster recovery solution should investigate Mondo Rescue.
Mondo Rescue uses bash scripts for CD-spanned, afio-based
archiving (mondo-archive), and automated boot disk/CD creation (mindi).
During recovery, it uses a ncurses-based partitioning and
restore utility (mondo-restore). Although the afio "double
archiving" method does not allow direct browsing like back2cd
(a non-issue when doing full restores), errors are usually isolated
because afio does per-file compression inside its archives.
Resources
mkisofs/cdrecord ISO9660/.iso Mastering/Recording Programs --
http://www.fokus.gmd.de/research/ \ cc/glone/employees/joerg.schilling/private/cdrecord.html
gzip, bzip2, and lzop File Compression Programs
-- http://www.gzip.org/ http://sources.redhat.com/
bzip2/ http://www.oberhumer.com/opensource/lzop/
tcsh (Enhanced version of the Berkeley C shell) --
http://www.tcsh.org
Cygwin (Cygnus GNU/UNIX Environment on Windows) -- http://sources.redhat.com/cygwin/
Bryan J. Smith has a BSCpE from the University of Central Florida.
For the past decade, he has worked in a dual engineering/IT role
at firms involved with civil, aerospace, and semiconductor design.
His consulting firm, SmithConcepts, Inc., offers engineering-focused
IT solutions. Outside of periodicals, Bryan was a contributing author
on Samba Unleashed. He currently lives near Orlando with
his wife Lourdes, and can be contacted at: b.j.smith@ieee.org.
|