Free
Snapshots?
Peter Baer Galvin
Over the past few months, I've been covering new and useful
Solaris 8 features in the Solaris Companion. This month continues
the trend by looking at the new fssnap command, which provides
snapshot copies of the default UFS file system, much like commercial
file systems provide. But is it a winner like UFS logging and IP
multipathing? This month I test fssnap, I provide useful
new reference services, and some feedback on a previous column.
Much Ado About Nothing
What a contrast there is between Sun and Microsoft. I recently
attended the Windows XP launch to see what the fuss was about. The
three-hour seminar by Microsoft employees conveyed these important
XP points:
- All previous operating systems by Microsoft had serious flaws,
including inefficient user interfaces and a lack of reliability.
- Look how cool our new screen savers are.
- Look at these games we include!
I thought about what the industry would do if Sun had a press
conference about a new release of Solaris and said similar things
about their operating system. In fact, Sun tends to do the opposite
of Microsoft publicity -- they have great engineers who busily
add lots of interesting features to Solaris, and then they leave
it up to the end users to find them, figure out how they work, and
figure out how to use them!
Snapshots, in Theory
On that note, my column this month explores another hidden Solaris
feature -- UFS snapshots. A snapshot can be thought of as a
cousin of RAID 1. Rather than performing a block-by-block copy of
a disk, and then performing all writes to both copies, a snapshot
takes a shortcut. The snapshot starts from an original disk (in
this case, actually a UFS partition) and instead of copying all
of the original blocks, it creates a copy of the metadata structures.
In essence, it has pointers to all the data blocks. Thus, a snap
is very fast to create.
The snapshot is placed within a file system or (theoretically)
on a raw device. The snapshot target is called the "backing
store". Changes to the snapped file system are then handled
specially. For every block (metadata or normal data) that is to
be written to the snapped file system, a copy of the original contents
is created and placed on the snapshot and then the write is allowed
to occur to the original file system. In this manner, the original
source file system is kept up to date and its snapshot copy has
the contents that the file system had when the snapshot occurred.
Why is this useful? In theory, there are many uses for it. Certainly
other products that include this snapshot feature (Network Appliance,
the Veritas File System) allow some great functionality.
- By having the snapshot copy mounted, a user could "cd
back in time" by accessing the snapshot copy rather than
the original file system. For example, if the system created a
snapshot of a file system every midnight, a user could look at
the system as it was yesterday, the day before he deleted
that important file. He could then copy the file from "yesterday"
to the current file system to restore its use.
- By allowing multiple snapshots of the same file system, it
is possible to have views of the file system as it was every day
in the past week, or every four hours in the past day, and so
on.
- By quiescing a database or other important application, taking
a snapshot, and continuing the application, a reliable, consistent
backup could be made of data that is constantly changing. The
snapshot copy is a view of the world at a time when the data was
on disk and not changing. The application can continue running,
users can continue their work, and the backup can be done without
interfering with their activities.
Because snapshots are fast and low overhead, they can be used
extensively without great concern for system performance or disk
use (although those aspects must also be considered).
Implementation
How does Sun's current implementation compare to the others
that are available? I tested UFS snapshots on a Solaris 8 7/01 release
Ultra 10, upgraded with the latest kernel jumbo patch and file system
patches. There is only one command for UFS snapshots, which certainly
keeps testing simple. fssnap performs UFS snapshots, provides
information about them, and manages and deletes them.
The basic command to create a snapshot is:
# fssnap -o backing-store=/snap /
/dev/fssnap/0
Where backing-store is the file system on which to put the
snapshot, and the last argument / is the file system to snap.
The command returns a device name, which is an access point to the
snapshot file system. Of course, you can create multiple snapshots,
one per file system on the system:
# fssnap -o backing-store=/snap /opt
/dev/fssnap/1
The snapshot operation on a quiet file system took a few seconds.
The busier the file system, the longer the operation.
A snapshot can reside on any file system type, even NFS, which
allows you to snap to a remote server. Of course, the snap is only
useful when accessed from the original snapped server, where the
rest of the data blocks reside.
Unfortunately, my testing revealed that an unmounted device cannot
currently be used as the backing store, contrary to the documentation.
There is a bug on sunsolve.sun.com against this problem,
so hopefully it will get solved with a patch or a future Solaris
release. There are several other errors in the documentation on
http://docs.sun.com, including incorrect arguments to several
commands. The examples in this column use the correct commands and
arguments.
Now we can check the status of a snapshot:
# fssnap -i /
Snapshot number : 0
Block Device : /dev/fssnap/0
Raw Device : /dev/rfssnap/0
Mount point : /
Device state : idle
Backing store path : /snap/snapshot0
Backing store size : 2624 KB
Maximum backing store size : Unlimited
Snapshot create time : Wed Oct 31 10:20:18 2001
Copy-on-write granularity : 32 KB
Note that there are several options on snapshot creation, including
limiting the maximum amount of disk space that the snap can take on
its backing store.
From the system point of view, the snapshot looks a bit strange.
The disk use, at least at the initial snap, is minimal as would
be expected:
# df -k
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s0 4030518 1411914 2578299 36% /
/proc 0 0 0 0% /proc
fd 0 0 0 0% /dev/fd
mnttab 0 0 0 0% /etc/mnttab
swap 653232 16 653216 1% /var/run
swap 653904 688 653216 1% /tmp
/dev/dsk/c0t1d0s7 5372014 262299 5055995 5% /opt
/dev/dsk/c0t0d0s7 4211158 2463312 1705735 60% /export/home
/dev/dsk/c0t1d0s0 1349190 3313 1291910 1% /snap
However, an ls shows seemingly very large files:
# ls -l /snap
total 6624
drwx------ 2 root root 8192 Oct 31 10:19 lost+found
-rw------- 1 root other 4178771968 Oct 31 10:30 snapshot0
-rw------- 1 root other 5500942336 Oct 31 10:24 snapshot1
These files are "holey". Logically, they are the same size
as the snapped file system. As changes are made to the original, the
actual size of the snapshot grows as it holds the original versions
of each block. However, almost all of the blocks are empty at the
start, and so are left as "holes" in the file. The disk
use is thus only the metadata and blocks that have changed.
The performance impact of a snapshot is that any write to a snapped-file
system first has the original block written to the snap, so writes
are 2X non-snapped file systems. This is similar to the overhead
of RAID-1. Typically, RAID-1 writes are done synchronously to both
mirror devices. That is, the writes must make it to both disks before
the write operation is considered to be complete. This extra overhead
makes writes more expensive. It is not clear whether snapfs
commands are done synchronously or asynchronously, although it is
likely the former.
What can be done once a snapshot is created? Certainly a backup
can be made of the snapshot, solving the previously ever-present
"how to back up a live file system consistently" problem.
In fact, fssnap has built-in options to make it trivial to
use in conjunction with ufsdump:
# ufsdump 0ufN /dev/rmt/0 'fssnap -F ufs -o raw,bs=/snap,unlink/dev/rdsk/c0t0d0s0'
This command will snapshot the root partition, ufsdump it to
tape, and then unlink the snapshot so the snapshot file is removed
when the command finishes (or at least the file should be removed).
In testing, the unlink option does indeed unlink the snapshot
file, but the fssnap -d command is required to terminate the
use of the snapshot and actually free up the disk space. Thus, this
would be the full command:
# ufsdump 0ufN /dev/rmt/0 'fssnap -F ufs -o raw,bs=/snap,unlink/dev/rdsk/c0t0d0s0'
# fssnap -d /
fssnap gets interesting when the snapshot itself is mounted
for access, as in:
# mount -o ro /dev/fssnap/0 /mnt
Now we can create a file in /, and see that it does not appear
in:
/mnt:
# touch /foo
# ls -l /foo
-rw-r--r-- 1 root other 0 Nov 5 12:25 /foo
# ls -l /mnt/foo
/mnt/foo: No such file or directory
Unfortunately, there does not appear to be any method to promote the
snapshot to replace the current active file system. For example, if
a systems administrator was about to attempt something complicated,
such as a system upgrade, she could perform a snapshot first. If she
did not like the result, she could restore the system to the snapshot
version. (Of course, the "live upgrade" feature that is
just now rolling out as part of Solaris provides a similar functionality.)
The backing store can be deleted manually after it's finished.
fssnap -d "deletes" the snap, but that is probably
the wrong terminology. Rather, it stops the use of the snapshot,
more like "detaching" it from the source file system.
To actually remove the snapshot, the snapshot file must also be
deleted via rm.
Alternately, the unlink option can be specified when the
snap is created. This prevents a file system directory entry from
being made for the file. In essence, it is then like an open, deleted
file. Once the file is closed, the inode and its data are automatically
removed. Unlinked files are not visible in the file system via ls
and similar commands, making them harder to manage than normal "linked"
file systems.
Apparently only one active snapshot of a file system can exist.
This limits the utility of UFS snapshots to be a kind of safety
net for users or systems administrators. For instance, a snapshot
could be made once a night, but only one day's worth of old
data would then be available.
Another limitation comes from the Sun documentation. If the backing-store
file runs out of space, the snapshot might delete itself, which
could cause backups and access to the snapshot to fail. Errors are
logged to /var/adm/messages, which should be checked for
possible snapshot errors.
Summary
On the whole, fssnap is a welcome addition to the UFS functionality.
Sun is obviously paying attention to its file systems, and adding
features to make it more competitive with the commercial offerings.
There are a couple of limitations in the current implementation,
making it useful only for creating consistent ufsdump backups.
I hope the functionality will increase over time.
Useful References
Book publishers are finally starting to take advantage of Internet
functionality to allow their books to be more widely used. Two examples
are http://www.books24X7.com books24x7.com and http://safari.oreilly.com/
O'Reilly's Safari. Books 24X7 provides unlimited access
to hundreds of books on line, with a sophisticated search engine,
live links from within books to Internet resources, and fully scanned
diagrams. Of course, it has a price to match those high-end features.
The Safari project, which includes O'Reilly, Addison Wesley,
New Riders, Prentice Hall, and other presses, provides access to
many books, but in a more limited fashion. It has a point system
in which you pay for a certain number of points per month, and that
gives you access to that many points worth of books in the month.
Every month, you can change which books you access with those points.
Both services are worth looking into if you enjoy ready access
to reference and "how-to" materials.
Letters
Thanks for the informative note from Boyd Fletcher:
Good article (Reliable Network with Solaris, November 2001:
http://www.samag.com/documents/s=1441/sam0111i/0111i.htm).
You are correct -- AP and IP Multipathing are mutually exclusive.
AP has been replaced as of Solaris 8 07/01 with MPXIO (IO multipathing)
for hard disks and IP multipathing for network cards. Both are easier
to configure, faster, and more reliable that AP. On the Serengeti
machines AP is not available so you have to use MPXIO and IPMP.
You might want to mention in a follow-up article that IPMP can wreck
havoc on your VLANs and may not even work if you are running with
port lockdowns based on MAC addresses.
Also, SAN StorEdge 3.0 at http://www.sun.com/storage/san/
is required for MPXIO. AP will still work with existing hardware,
just not on the SunFire line.
Boyd
Peter Baer Galvin is the Chief Technologist for Corporate Technologies,
a premier systems integrator and VAR. Before that, Peter was the
systems manager for Brown University's Computer Science Department.
He has written articles for Byte and other magazines, and previously
wrote Pete's Wicked World, the security column, and Pete's
Super Systems, the systems management column for Unix Insider. Peter
is coauthor of the Operating Systems Concepts and Applied
Operating Systems Concepts textbooks. As a consultant and trainer,
Peter has taught tutorials and given talks on security and systems
administration worldwide.
|