The Logical Volume Manager (LVM) provides fixed disk
storage management
for the AIX operating system by mapping physical disk
space to logical
storage units. Its hierarchical structure is unique
among UNIX environments.
Its most beneficial features include:
The ability to place data on a specific region of a
disk drive for improved performance. The ability to
expand disk storage
online for easier maintenance.
Like most AIX administrative functions, the Logical
Volume Manager
can be administered through IBM's System Management
Interface Tool
(SMIT), a menu-driven interface that allows system administrators
to perform many tasks without having to remember long
command strings.
You can use SMIT to configure all of the parameters
about to be discussed,
as well as to create mirrors and expanded filesystems
online. However,
SMIT does not explain the theory or the complexity of
the Logical
Volume Manager. That is this article addresses.
Physical and Logical Volume Organization
The Logical Volume Manager is a device driver for disk
I/O which incorporates
a special library of subroutines. To understand it,
you must first
understand the conceptual differences between physical
and logical
devices. A physical disk device is real or tangible
disk storage.
A logical disk device defines how the user views the
physical storage
and simplifies the sharing and management of the physical
storage.
An analogy would be a single DOS hard drive partitioned
into two logical
disk drives. The user sees two logical disk drives,
c: and
d:, which are both part of the same physical unit. Data
is
presented to the user on two logical disk drives which
are both actually
logical partitions of one physical unit. Logical volumes
are raw data
partitions created on physical disk drives which can
be used to contain
filesystems or other structures.
Physical Devices
The Logical Volume Manager refers to disks drives as
physical volumes.
Each physical volume must be assigned by the system
administrator
to a volume group before it can be used. Volume groups
may contain
from 1 to 32 physical volumes (disk drives). Physical
volumes assigned
to a volume group can be of different types and sizes.
Volume groups
must be assigned unique names. During installation,
AIX automatically
creates the volume group rootvg, which contains the
operating
system. You can create up to 255 volume groups. (I will
discuss volume
groups in more detail later.)
The Logical Volume Manager partitions physical volumes
(disks) within
each volume group into 4-Mb chunks called physical partitions.
A 1.2-Gb
disk is partitioned into 287 physical partitions; therefore,
4 x
287 = 1148 megabytes. The remaining disk space is used
by overhead.
The 4-Mb size is a tunable parameter within each volume
group. For
the sake of simplicity, I will assume the 4-Mb default
when referring
to physical partitions throughout this article.
The Logical Volume Manager segments each physical volume
into five
equal regions or bands. The geometry of a disk drive
consists of several
stacked platters. The band of cylinders around the outer
edge of the
platters make up the outer region. The band of cylinders
closest to
the spindle make up the inner region. Between the inner
and outer
regions are the three remaining regions: the inner middle
region,
the center region, and the outer middle region. The
4-Mb physical
partitions are distributed evenly among the five regions.
The distribution
of a 1.2-Gb disk drive is presented in Figure 1.
The center region of every disk will yield the highest
performance,
as data placed in this region will benefit from shorter
seek times.
The further the data is placed from the center region,
the longer
the seek times will become. The system administrator
can specify the
region in which data is to be placed.
Logical Devices
Physical partitions are grouped into logical partitions.
A logical
partition can have from one to three physical partitions.
A logical
partition consisting of three physical partitions is
still logically
viewed as 4 megabytes of data even though 12 megabytes
of physical
storage are being used. This is because, the 4-Mb data
chunk has a
total of three copies, which is how the Logical Volume
Manager accomplishes
disk mirroring. Data can have a single or double mirror
depending
on the number of physical partitions or copies chosen
by the system
administrator.
Logical partitions are grouped into logical volumes.
Filesystems,
page space, and raw data partitions must all be contained
within logical
volumes. Recall that each logical partition can have
up to three physical
partitions, but the mirrored physical partitions are
all copies of
the same data. Therefore, to create a filesystem of
16 megabytes,
you would need to create a logical volume with four
logical partitions.
If you wanted a single mirror of the 16-Mb logical volume,
you would
still need only four logical partitions. Each logical
partition would
then be comprised of two physical partitions, one for
each copy of
the data. The logical view of the data remains as four
logical partitions
totaling 16 megabytes, while the physical size required
to contain
both copies would actually be 32 megabytes, or eight
physical partitions.
It is not possible to mirror a single logical partition.
All logical
partitions within a logical volume must be mirrored.
Data Management
Since a mirrored logical partition consists of a primary
and secondary
physical partition both having the same data, a pecking
order must
be established when data is to be transferred to and
from mirrored
logical partitions. That order is defined as the scheduling
policy.
The Logical Volume Manager allows the system administrator
to choose
either a sequential or a parallel scheduling policy
for writing and
reading data to and from a mirrored logical partition.
Data transferred
sequentially uses the primary physical partition first
and then the
secondary. Data transferred in parallel uses both the
primary and
secondary copies at the same time. Sequential writes
wait for the
primary to complete before writing to the secondary.
Parallel writes
execute simultaneously and complete when the last unit
to be written
is done; parallel reads access the drive which can complete
the transfer
first. Sequential scheduling offers higher data reliability,
while
parallel scheduling offers higher performance for mirrored
logical
partitions.
Consider a volume group created with two physical volumes,
each 1.2
gigabytes in size. Assume that I wish to create a logical
volume of
456 megabytes to be used as a raw partition for a database,
and that
I want that logical volume to span both drives in my
volume group.
This procedure, called disk striping, will yield a higher
performance
than placing the entire logical volume on a single physical
volume
(disk).
The Logical Volume Manager allows you to choose which
physical volumes
in your volume group to stripe the data across. You
must also set
the option known as inter-physical allocation to maximum,
to
force the Logical Volume Manager to attempt to create
the logical
volume across as many physical volumes as you have selected.
In this
case, if both physical volumes in the volume group have
been selected
to contain the logical volume, the Logical Volume Manager
will try
to stripe the logical volume evenly across both physical
volumes.
If you set the inter-physical allocation option to minimum,
the system
attempts to place the logical volume on only one physical
volume.
(Note that the use of the prefix "inter" implies
that more
than one disk is involved.)
Creating the logical volume in the center region of
both physical
volumes increases performance by reducing seek time.
The Logical Volume
Manager allows you to specify which of the five disk
regions is to
contain the logical volume. This is known as intra-physical
volume allocation. The prefix "intra" here
refers to the placement
of the logical volume within each physical volume.
The following are some calculations made to stripe the
456-Mb logical
volume across two physical volumes with one copy of
the data (no mirror):
456 megabyte logical volume
---------------------------------- = 228 megabyteson each disk.
2 physical volumes
228 megabytes on each disk
---------------------------------- = 57 physical partitions/disk
4 megabytes/logical partition
The distribution for this logical volume would appear
as in Figure 2,
provided the center region is available on each physical
volume.
The system will always make an effort to honor a request
for a particular
region. If that region is not available, it will automatically
make
the next best choice. An error message will be displayed
only if there
are not enough physical partitions to create the logical
volume on
the disks selected in that volume group.
When there are two copies in each logical partition
(mirroring), the
Logical Volume Manager allows you to place the mirrors
on separate
physical volumes (disks). The placement of mirrored
physical partitions
on different physical volumes is known as strict allocation
-- it provides the best protection against media failure.
Data Protection and Recovery Features
If a physical partition is physically damaged, that
partition becomes
unusable. If your disk controller does not provide bad
block relocation
as a hardware feature, the Logical Volume Manager allows
you to emulate
this feature through its software, providing bad block
relocation
on writes.
The Logical Volume Manager also lets you specify read
verification
after each write. Because this involves an extra read
operation, you
must choose between better performance or increased
data integrity.
Mirror write consistency is another option offered by
the Logical
Volume Manager. This option guarantees that data remains
consistent
on mirrored copies to ensure integrity in the event
of a hardware
failure.
Volume Group Management
At this point I want to return to the discussion on
volume groups.
After a volume group is created, it must be activated
either through
SMIT or by using the varyonvg command. Before a volume
group
is activated, the Logical Volume Manager will try to
read management
data stored in that volume group. This data -- consisting
of the
volume group descriptor area (VGDA) and the volume group
status area
(VGSA) -- is automatically created by the Logical Volume
Manager
when the volume group is created.
The VGDA area contains the logical-to-physical mapping
of the logical
partitions. The VGSA area contains information on stale
physical partitions
and inactive physical volumes. The Logical Volume Manager
maintains
more than one copy of the VGDA and VGSA information
to ensure its
integrity. A volume group with one physical volume will
contain this
information twice on the same disk. A volume group with
two physical
volumes will contain at least three copies of the VGDA
and VGSA information,
with copies stored on both physical volumes.
When more than half of the VGDA and VGSA copies for
a volume group
are good, the volume group is said to have quorum. If
the volume group
should loose quorum, the Logical Volume Manager will
varyoffvg
that volume group (make it inactive) because it can
no longer guarantee
the integrity of that volume group. A volume group with
inactive physical
volumes can varyonvg as long as it maintains quorum.
Since the logical-to-physical mapping of the volume
group is stored
within the volume group (specifically in the VGDA),
volume groups
can be exported from one RS-6000 and imported to another.
Careful design of volume groups, e.g., grouping disks
that contain
related data, can improve performance and allow for
easier maintenance.
Volume Groups and Backups
It is extremely important to consider volume groups
before developing
backup strategies for mounted filesystems and raw partitions.
IBM
provides a backup utility, mksysb, that allows you to
create
a bootable image of your system on a single tape. However,
mksysb
only backs up mounted filesystems in rootvg, where the
operating
system is installed. If your application and your operating
system
are both installed in rootvg, you can backup your operating
system and your application all on the same bootable
tape. However,
if rootvg is too large to fit on a single tape, your
mksysb
will fail. If your application creates large work files
that are not
critical for rebuilding your system, create another
volume group to
store those work files. This should enable you to install
your application
in rootvg and use mksysb to backup both your operating
system and your application. Databases that use raw
partitions to
store data should keep the raw data in a separate volume
group, for
easier maintenance and better performance.
Volume Management
To display the volume groups on your system, use the
lsvg
command. On my system, three volume groups are defined:
$ lsvg
rootvg
sybasevg
tempvg
When passed the name of a volume group as an argument,
the lsvg command can produce a great deal of output
about
that particular volume group. If used with the "-i"
option,
lsvg acts as a filter, enabling the output of one lsvg
command to be piped into a second lsvg command, to provide
detailed information on all the volume groups defined
on your system.
Figure 3 shows output from this command for my system.
(The information available through the lsvg command
can also
be displayed using SMIT. However, IBM has included a
rich set of ls
commands with AIX -- in total, more than 30 such commands
--
which can provide useful information on all aspects
of your system.
As a system administrator, I have found it beneficial
to become familiar
with them.)
Another useful ls command is the lspv command, which
lists all the physical volumes in your system. The example
in Figure 4
does this for my system. This system runs a large
database application,
the raw data partitions for which have all been created
in sybasevg.
rootvg contains the operating system and my application
programs.
I created tempvg as a work space where my applications
can
store large files not crucial for rebuilding. For backup,
I use the
IBM utility mksysb mentioned earlier. In the lsvg
output, the first field is the name of the physical
volume (disk),
the second field is the physical volume identification
number, and
the last field is the volume group this physical volume
is part of
(if any).
The df command also returns useful information, as in
the
example in Figure 5, but the display is somewhat misleading.
The column
labeled "Filesystem" is actually the logical
volume that contains
the filesystem. Recall that logical volumes must have
unique names.
If no name is given when a logical volume is created,
the system will
automatically assign a unique name. The logical volumes
/dev/lv00
and /dev/lv01 in Figure 5 were default names assigned
by the
system. Most system administrators refer to the mount
point as the
name of a filesystem. The columns labeled "iused"
refer to
inode usage.
To display all the logical volumes in your system, use
the lsvg
command with the "-l" and "-i" options.
Using the
"-i" option means you'll need to take input
from another lsvg
command, hence the command syntax in Figure 6, which
will provide
a list of every logical volume in your system. The command's
output
is organized by volume group. The first column, "LV
NAME,"
displays the name of the logical volume. The "TYPE"
column
displays the usage of the logical volume. Except for
type "jfs",
all of the types are standard in all UNIX operating
systems: paging
is used for page space, boot is what the system reads
when booting
up, sysdump is the dump area.
The "jfs" volume type derives from the fact
that AIX uses
a journaled filesystem (jfs). This means that the Logical
Volume Manager
logs transactions in the same way a database logs transactions.
If
the system should crash before a write operation is
committed, the
transaction will be rolled back (because of this level
of sophistication,
utilities such as fsck are rarely if ever needed). The
logical
volume called jfslog is the actual transaction log for
all
filesystems in that particular volume group. Notice
that sybasevg,
which holds all the raw database partitions, has no
jfslog.
This is because Sybase does not rely on the operating
system but rather
maintains its own transaction log.
You can tell from the output in Figure 6 that none of
the logical
volumes have mirrors on this system, because the logical
and physical
partition counts are equal. The boot logical volume
is closed because
it is only used when the machine is booting up. The
logical volumes
have all their partitions in sync, which means that
the VGSA has no
stale partitions.
AIX's lslv command lets you display information about
a particular
logical volume. For the example in Figure 7, I chose
a logical volume
used as a raw database partition. I used the name "pd11_17181920"
because it represents Production Data on scsi controller
11 using
physical volumes 17, 18, 19, and 20. In the command
output, the upper
bound field refers to the maximum number of physical
volumes used
when the inter-physical allocation is set to maximum.
Note also that
it is possible to set an entire logical volume to read
only. The field
RELOCATABLE is set to "no," which prevents
this logical volume from
being moved if the reorgvg command (re-organize volume
group)
is executed.
When lslv is executed with the "-l" option,
the physical
distribution of the logical volume can be displayed
for each physical
volume that it spans. The first column names each physical
volume,
the second column displays the number of physical partitions,
the
third column displays the percentage of physical partitions
placed
in the requested region at the time the logical volume
was created.
In the example in Figure 8, I had specified that the
center region
was to be used when the logical volume was created.
Since there are
only 57 partitions available in the center (50 percent),
the remaining
57 partitions were automatically placed in the inner
middle region,
as shown in the last column, labeled "DISTRIBUTION".
Conclusion
The Logical Volume Manager is an extremely powerful
and flexible device
driver which you can administer via a menu interface
(SMIT) without
having to bring down your system. To use it effectively,
you'll need
to pay special attention to the inter and intra physical
regions: these concepts are unique in the UNIX environment
and control
the physical placement of data.
Once you understand the capabilities of the Logical
Volume Manager,
you can use it to partition physical disk space for
easier maintenance
and better performance. If for example, your application
involves
the random access of many small files, you may benefit
by changing
the default physical partition size from four megabytes
to two megabytes
or perhaps even one megabyte. In addition you can take
advantage of
the many standard features, including disk mirroring
and disk striping,
that are only available on other systems with redundant
arrays of
independent disks (RAID).
Bibliography
IBM General Concepts and Procedures. IBM
Publication GC23-2202-02.
IBM General Programming Concepts. IBM Publication
SC23-2205-03.
IBM System Management Guide. IBM Publication
GC23-2486-00.
About the Author
Bill Genosa is a systems administrator for American
Express, where he
has responsibility for RS6000 workstations and servers.
He can be reached at 186 Bryant Avenue, Floral Park,
NY 11001,
or via email as wgenosa@attmail.com.