Cover V03, I04
Article
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

jul94.tar


AIX's Logical Volume Manager

Bill Genosa

The Logical Volume Manager (LVM) provides fixed disk storage management for the AIX operating system by mapping physical disk space to logical storage units. Its hierarchical structure is unique among UNIX environments. Its most beneficial features include:

  • The ability to mirror data for increased integrity.

  • The ability to stripe data across disk drives for increased performance.

  • The ability to place data on a specific region of a disk drive for improved performance. The ability to expand disk storage online for easier maintenance.

    Like most AIX administrative functions, the Logical Volume Manager can be administered through IBM's System Management Interface Tool (SMIT), a menu-driven interface that allows system administrators to perform many tasks without having to remember long command strings. You can use SMIT to configure all of the parameters about to be discussed, as well as to create mirrors and expanded filesystems online. However, SMIT does not explain the theory or the complexity of the Logical Volume Manager. That is this article addresses.

    Physical and Logical Volume Organization

    The Logical Volume Manager is a device driver for disk I/O which incorporates a special library of subroutines. To understand it, you must first understand the conceptual differences between physical and logical devices. A physical disk device is real or tangible disk storage. A logical disk device defines how the user views the physical storage and simplifies the sharing and management of the physical storage. An analogy would be a single DOS hard drive partitioned into two logical disk drives. The user sees two logical disk drives, c: and d:, which are both part of the same physical unit. Data is presented to the user on two logical disk drives which are both actually logical partitions of one physical unit. Logical volumes are raw data partitions created on physical disk drives which can be used to contain filesystems or other structures.

    Physical Devices

    The Logical Volume Manager refers to disks drives as physical volumes. Each physical volume must be assigned by the system administrator to a volume group before it can be used. Volume groups may contain from 1 to 32 physical volumes (disk drives). Physical volumes assigned to a volume group can be of different types and sizes. Volume groups must be assigned unique names. During installation, AIX automatically creates the volume group rootvg, which contains the operating system. You can create up to 255 volume groups. (I will discuss volume groups in more detail later.)

    The Logical Volume Manager partitions physical volumes (disks) within each volume group into 4-Mb chunks called physical partitions. A 1.2-Gb disk is partitioned into 287 physical partitions; therefore, 4 x 287 = 1148 megabytes. The remaining disk space is used by overhead. The 4-Mb size is a tunable parameter within each volume group. For the sake of simplicity, I will assume the 4-Mb default when referring to physical partitions throughout this article.

    The Logical Volume Manager segments each physical volume into five equal regions or bands. The geometry of a disk drive consists of several stacked platters. The band of cylinders around the outer edge of the platters make up the outer region. The band of cylinders closest to the spindle make up the inner region. Between the inner and outer regions are the three remaining regions: the inner middle region, the center region, and the outer middle region. The 4-Mb physical partitions are distributed evenly among the five regions. The distribution of a 1.2-Gb disk drive is presented in Figure 1.

    The center region of every disk will yield the highest performance, as data placed in this region will benefit from shorter seek times. The further the data is placed from the center region, the longer the seek times will become. The system administrator can specify the region in which data is to be placed.

    Logical Devices

    Physical partitions are grouped into logical partitions. A logical partition can have from one to three physical partitions. A logical partition consisting of three physical partitions is still logically viewed as 4 megabytes of data even though 12 megabytes of physical storage are being used. This is because, the 4-Mb data chunk has a total of three copies, which is how the Logical Volume Manager accomplishes disk mirroring. Data can have a single or double mirror depending on the number of physical partitions or copies chosen by the system administrator.

    Logical partitions are grouped into logical volumes. Filesystems, page space, and raw data partitions must all be contained within logical volumes. Recall that each logical partition can have up to three physical partitions, but the mirrored physical partitions are all copies of the same data. Therefore, to create a filesystem of 16 megabytes, you would need to create a logical volume with four logical partitions. If you wanted a single mirror of the 16-Mb logical volume, you would still need only four logical partitions. Each logical partition would then be comprised of two physical partitions, one for each copy of the data. The logical view of the data remains as four logical partitions totaling 16 megabytes, while the physical size required to contain both copies would actually be 32 megabytes, or eight physical partitions. It is not possible to mirror a single logical partition. All logical partitions within a logical volume must be mirrored.

    Data Management

    Since a mirrored logical partition consists of a primary and secondary physical partition both having the same data, a pecking order must be established when data is to be transferred to and from mirrored logical partitions. That order is defined as the scheduling policy. The Logical Volume Manager allows the system administrator to choose either a sequential or a parallel scheduling policy for writing and reading data to and from a mirrored logical partition. Data transferred sequentially uses the primary physical partition first and then the secondary. Data transferred in parallel uses both the primary and secondary copies at the same time. Sequential writes wait for the primary to complete before writing to the secondary. Parallel writes execute simultaneously and complete when the last unit to be written is done; parallel reads access the drive which can complete the transfer first. Sequential scheduling offers higher data reliability, while parallel scheduling offers higher performance for mirrored logical partitions.

    Consider a volume group created with two physical volumes, each 1.2 gigabytes in size. Assume that I wish to create a logical volume of 456 megabytes to be used as a raw partition for a database, and that I want that logical volume to span both drives in my volume group. This procedure, called disk striping, will yield a higher performance than placing the entire logical volume on a single physical volume (disk).

    The Logical Volume Manager allows you to choose which physical volumes in your volume group to stripe the data across. You must also set the option known as inter-physical allocation to maximum, to force the Logical Volume Manager to attempt to create the logical volume across as many physical volumes as you have selected. In this case, if both physical volumes in the volume group have been selected to contain the logical volume, the Logical Volume Manager will try to stripe the logical volume evenly across both physical volumes. If you set the inter-physical allocation option to minimum, the system attempts to place the logical volume on only one physical volume. (Note that the use of the prefix "inter" implies that more than one disk is involved.)

    Creating the logical volume in the center region of both physical volumes increases performance by reducing seek time. The Logical Volume Manager allows you to specify which of the five disk regions is to contain the logical volume. This is known as intra-physical volume allocation. The prefix "intra" here refers to the placement of the logical volume within each physical volume.

    The following are some calculations made to stripe the 456-Mb logical volume across two physical volumes with one copy of the data (no mirror):

    456 megabyte logical volume
    ---------------------------------- = 228 megabyteson each disk.
    2 physical volumes
    228 megabytes on each disk
    ---------------------------------- = 57 physical partitions/disk
    4 megabytes/logical partition

    The distribution for this logical volume would appear as in Figure 2, provided the center region is available on each physical volume. The system will always make an effort to honor a request for a particular region. If that region is not available, it will automatically make the next best choice. An error message will be displayed only if there are not enough physical partitions to create the logical volume on the disks selected in that volume group.

    When there are two copies in each logical partition (mirroring), the Logical Volume Manager allows you to place the mirrors on separate physical volumes (disks). The placement of mirrored physical partitions on different physical volumes is known as strict allocation -- it provides the best protection against media failure.

    Data Protection and Recovery Features

    If a physical partition is physically damaged, that partition becomes unusable. If your disk controller does not provide bad block relocation as a hardware feature, the Logical Volume Manager allows you to emulate this feature through its software, providing bad block relocation on writes.

    The Logical Volume Manager also lets you specify read verification after each write. Because this involves an extra read operation, you must choose between better performance or increased data integrity.

    Mirror write consistency is another option offered by the Logical Volume Manager. This option guarantees that data remains consistent on mirrored copies to ensure integrity in the event of a hardware failure.

    Volume Group Management

    At this point I want to return to the discussion on volume groups. After a volume group is created, it must be activated either through SMIT or by using the varyonvg command. Before a volume group is activated, the Logical Volume Manager will try to read management data stored in that volume group. This data -- consisting of the volume group descriptor area (VGDA) and the volume group status area (VGSA) -- is automatically created by the Logical Volume Manager when the volume group is created.

    The VGDA area contains the logical-to-physical mapping of the logical partitions. The VGSA area contains information on stale physical partitions and inactive physical volumes. The Logical Volume Manager maintains more than one copy of the VGDA and VGSA information to ensure its integrity. A volume group with one physical volume will contain this information twice on the same disk. A volume group with two physical volumes will contain at least three copies of the VGDA and VGSA information, with copies stored on both physical volumes.

    When more than half of the VGDA and VGSA copies for a volume group are good, the volume group is said to have quorum. If the volume group should loose quorum, the Logical Volume Manager will varyoffvg that volume group (make it inactive) because it can no longer guarantee the integrity of that volume group. A volume group with inactive physical volumes can varyonvg as long as it maintains quorum.

    Since the logical-to-physical mapping of the volume group is stored within the volume group (specifically in the VGDA), volume groups can be exported from one RS-6000 and imported to another.

    Careful design of volume groups, e.g., grouping disks that contain related data, can improve performance and allow for easier maintenance.

    Volume Groups and Backups

    It is extremely important to consider volume groups before developing backup strategies for mounted filesystems and raw partitions. IBM provides a backup utility, mksysb, that allows you to create a bootable image of your system on a single tape. However, mksysb only backs up mounted filesystems in rootvg, where the operating system is installed. If your application and your operating system are both installed in rootvg, you can backup your operating system and your application all on the same bootable tape. However, if rootvg is too large to fit on a single tape, your mksysb will fail. If your application creates large work files that are not critical for rebuilding your system, create another volume group to store those work files. This should enable you to install your application in rootvg and use mksysb to backup both your operating system and your application. Databases that use raw partitions to store data should keep the raw data in a separate volume group, for easier maintenance and better performance.

    Volume Management

    To display the volume groups on your system, use the lsvg command. On my system, three volume groups are defined:

    $ lsvg
    
    rootvg
    sybasevg
    tempvg

    When passed the name of a volume group as an argument, the lsvg command can produce a great deal of output about that particular volume group. If used with the "-i" option, lsvg acts as a filter, enabling the output of one lsvg command to be piped into a second lsvg command, to provide detailed information on all the volume groups defined on your system. Figure 3 shows output from this command for my system.

    (The information available through the lsvg command can also be displayed using SMIT. However, IBM has included a rich set of ls commands with AIX -- in total, more than 30 such commands -- which can provide useful information on all aspects of your system. As a system administrator, I have found it beneficial to become familiar with them.)

    Another useful ls command is the lspv command, which lists all the physical volumes in your system. The example in Figure 4 does this for my system. This system runs a large database application, the raw data partitions for which have all been created in sybasevg. rootvg contains the operating system and my application programs. I created tempvg as a work space where my applications can store large files not crucial for rebuilding. For backup, I use the IBM utility mksysb mentioned earlier. In the lsvg output, the first field is the name of the physical volume (disk), the second field is the physical volume identification number, and the last field is the volume group this physical volume is part of (if any).

    The df command also returns useful information, as in the example in Figure 5, but the display is somewhat misleading. The column labeled "Filesystem" is actually the logical volume that contains the filesystem. Recall that logical volumes must have unique names. If no name is given when a logical volume is created, the system will automatically assign a unique name. The logical volumes /dev/lv00 and /dev/lv01 in Figure 5 were default names assigned by the system. Most system administrators refer to the mount point as the name of a filesystem. The columns labeled "iused" refer to inode usage.

    To display all the logical volumes in your system, use the lsvg command with the "-l" and "-i" options. Using the "-i" option means you'll need to take input from another lsvg command, hence the command syntax in Figure 6, which will provide a list of every logical volume in your system. The command's output is organized by volume group. The first column, "LV NAME," displays the name of the logical volume. The "TYPE" column displays the usage of the logical volume. Except for type "jfs", all of the types are standard in all UNIX operating systems: paging is used for page space, boot is what the system reads when booting up, sysdump is the dump area.

    The "jfs" volume type derives from the fact that AIX uses a journaled filesystem (jfs). This means that the Logical Volume Manager logs transactions in the same way a database logs transactions. If the system should crash before a write operation is committed, the transaction will be rolled back (because of this level of sophistication, utilities such as fsck are rarely if ever needed). The logical volume called jfslog is the actual transaction log for all filesystems in that particular volume group. Notice that sybasevg, which holds all the raw database partitions, has no jfslog. This is because Sybase does not rely on the operating system but rather maintains its own transaction log.

    You can tell from the output in Figure 6 that none of the logical volumes have mirrors on this system, because the logical and physical partition counts are equal. The boot logical volume is closed because it is only used when the machine is booting up. The logical volumes have all their partitions in sync, which means that the VGSA has no stale partitions.

    AIX's lslv command lets you display information about a particular logical volume. For the example in Figure 7, I chose a logical volume used as a raw database partition. I used the name "pd11_17181920" because it represents Production Data on scsi controller 11 using physical volumes 17, 18, 19, and 20. In the command output, the upper bound field refers to the maximum number of physical volumes used when the inter-physical allocation is set to maximum. Note also that it is possible to set an entire logical volume to read only. The field RELOCATABLE is set to "no," which prevents this logical volume from being moved if the reorgvg command (re-organize volume group) is executed.

    When lslv is executed with the "-l" option, the physical distribution of the logical volume can be displayed for each physical volume that it spans. The first column names each physical volume, the second column displays the number of physical partitions, the third column displays the percentage of physical partitions placed in the requested region at the time the logical volume was created. In the example in Figure 8, I had specified that the center region was to be used when the logical volume was created. Since there are only 57 partitions available in the center (50 percent), the remaining 57 partitions were automatically placed in the inner middle region, as shown in the last column, labeled "DISTRIBUTION".

    Conclusion

    The Logical Volume Manager is an extremely powerful and flexible device driver which you can administer via a menu interface (SMIT) without having to bring down your system. To use it effectively, you'll need to pay special attention to the inter and intra physical regions: these concepts are unique in the UNIX environment and control the physical placement of data.

    Once you understand the capabilities of the Logical Volume Manager, you can use it to partition physical disk space for easier maintenance and better performance. If for example, your application involves the random access of many small files, you may benefit by changing the default physical partition size from four megabytes to two megabytes or perhaps even one megabyte. In addition you can take advantage of the many standard features, including disk mirroring and disk striping, that are only available on other systems with redundant arrays of independent disks (RAID).

    Bibliography

    IBM General Concepts and Procedures. IBM Publication GC23-2202-02.

    IBM General Programming Concepts. IBM Publication SC23-2205-03.

    IBM System Management Guide. IBM Publication GC23-2486-00.

    About the Author

    Bill Genosa is a systems administrator for American Express, where he has responsibility for RS6000 workstations and servers. He can be reached at 186 Bryant Avenue, Floral Park, NY 11001, or via email as wgenosa@attmail.com.


     



  •