Article
Figure 1
Figure 2

jun_sup2001.tar

Real-Time Remote Data Mirroring

Primitivo Cervantes

When planning for a disaster, there are many issues to consider and resolve. Some of these issues are resolved with the use of a remote location as an emergency production site in case the primary site is unavailable due to a disaster. In this situation, a remote data center is set up with systems similar to the primary production systems. This remote disaster recovery site is usually connected to the primary site via a network.

Even with a remote disaster recovery site, there are still many issues to consider. The remote site is usually not in use during normal production hours and is a cost center until it is used as a recovery center. You must determine, for example, whether you want the exact system as the primary site or whether you can do with reduced performance during an emergency. You must decide whether you can afford to lose some data in an emergency or whether you need all of the most current data when bringing the systems online at the remote site. You must also consider how current the data must be and how much you are willing to pay for it. In this article, I will discuss various data-replication techniques and their associated advantages and disadvantages. I will also describe costs associated with these techniques and describe cases where it might be justifiable for them to be implemented. I will also describe an installation of one of these techniques, using IBM's HAGEO product to replicate data in real-time to a remote site.

Before I get into the data-mirroring techniques, I will explain the types of data that clients usually mirror. There are three basic types of data that are usually required to run a customer application: operating system data, database data, and non-database application data. Operating system data consists of the programs and files needed to run the operating system such as IBM's AIX. Database data is the actual data contained in a client's application typically using a database manager, such as Oracle, Sybase, or Informix. Non-database data is the client application's executables, configurations files, etc. needed to run the application (such as Oracle Financials, SAP, or PeopleSoft).

A system backup is typically used to backup and replicate the operating system data, so this will not be covered in this article. I will discuss only techniques to mirror database data and non-database application data.

Remote Data Mirroring Techniques

System Tools

Historically, data has been mirrored to a remote location using operating system tools such as ftp, rcp, uucp, etc. These tools have distinct advantages in that they are available with the operating system, easy to use, and very reliable.

Typically, a system backup is used to replicate a system at a remote location. An initial database backup is created and restored at the remote location. From this point on, the only things that need to be sent to the remote location are the database archive logs and any changed application executable or configuration file. By sending only the database logs to the remote location and then using the database to "play the logs" forward, the database at the remote location can be kept consistent but will always be behind the primary site. In other words, we send changes to the database to the remote location, but the most current changes are not sent.

The main disadvantage of these tools is that the remote site is not always as current as the primary site (behind in database transactions). This is usually reason enough to prevent many clients such as banks, and hospitals from using these tools. These clients demand a real-time mirroring solution.

Built-In Database Tools

Database vendors have been adding and enhancing remote mirroring capabilities to their products, and this is another way of mirroring data to a remote location. Using the database vendor products, the database data can be mirrored and kept consistent across a geography. The data at the remote location is just as current as the primary site.

The main advantages of database vendor tools are that they are available with the database at a low cost and are guaranteed to work by the vendor. If there is a problem, the vendor has a support structure that can help the client determine the cause and also provide a fix, if necessary.

The main disadvantages of database vendor tools are that they replicate only the database data and not application data. They are also sometimes difficult to use and may require the source code of the application be modified and recompiled.

Hardware-Based Tools

While database vendor tools are specific to the database vendor, hardware-based solutions aim at providing data mirroring regardless of the application or database vendor. One way to use hardware to mirror data is with a storage system from a storage vendor, such as EMC or IBM. EMC has long had remote mirroring capabilities, and IBM recently announced (Dec. 2000) mirroring capabilities in some of their storage products. These storage products allow you to mirror data from one storage system to another storage system at a remote site regardless of the data being mirrored. This mirroring is done in real-time and works very well. Because these hardware storage systems can use large amounts of cache, system performance can be very good.

In cases where there are long distances (greater than 60 miles) between sites, EMC uses three storage systems in series to mirror data. A primary storage system is mirrored in real-time to a secondary system at a nearby remote location (usually less than 10 miles). This secondary system then mirrors the data to the third storage system at best speed. This means the third storage system is not as current as the first storage system; but no data is lost, because it will eventually be fully synchronized from the second storage system.

IBM's mirroring solution is relatively new, so I have not had a chance to implement it at long distances. IBM says the mirroring capabilities will work at longer distances than EMC (about 100 miles).

Another consideration with the hardware solution is that performance degrades significantly as the distance between sites increases. I have seen increases of 4000% in write times with as little increase as 10 miles between sites. This is probably worst case, but nonetheless is significant and should be considered.

The main disadvantage of these hardware solutions is the cost. A single EMC or IBM storage system can easily cost $750K and the long-distance mirroring solutions will certainly be in the millions. Even so, for clients that need this additional safety net, I think the hardware solution will eventually provide the best combination of performance and data integrity. A hardware solution will probably be the most expensive but provide the most functionality.

IBM's GEOrm or HAGEO

If the client application happens to be running on an IBM AIX system, there are a couple of software products from IBM, called HAGEO and GEOrm, that could be used for real-time data mirroring.

IBM's HAGEO and GEOrm actually mirror logical volumes from one site to a remote site. They provide device drivers that intercept disk writes to the AIX logical volumes and mirror this data in real-time to the remote location.

The difference between the HAGEO product and the GEOrm product is that the HAGEO product also integrates with IBM's high-availability product HACMP. The HACMP product detects system failures and allows a secondary system to take over an application should the primary system fail. With the HACMP/HAGEO product combination, if your primary system fails, the secondary system will automatically detect this failure and take over the application, usually within half an hour or so.

IBM typically charges anywhere from $100K to $400K in software and services to implement a two-system HACMP/HAGEO cluster. Implementing GEOrm does not involve HACMP, so it usually costs less (anywhere from $40K to $150K). These are all ballpark figures, and you should contact an IBM representative for an actual quote.

The main advantage of the HAGEO and GEOrm solutions is that they are software based and work regardless of the database vendor or application. The main disadvantages are that it usually requires a higher skilled systems administrator to maintain and that it has some system performance implications. Since the data mirroring is done in real-time and without a large cache, this can have a high impact on system performance. Also, this solution is only available on IBM AIX systems.

An HAGEO Implementation

I have implemented and managed several HACMP/HAGEO projects, so I will relate my experience with this product at a utility company in California. In this example, the company had two systems, the primary one located in Los Angeles County and the secondary one in Orange County. They were using the IBM SP2 systems (the same type that defeated Kasparov at chess). The systems chosen used multiple processors and were several times faster than the single processor systems they were replacing. Everyone was initially optimistic that performance was not going to be an issue.

The disk drives were IBM SSA drives (only 14 4-GB disks in all). The data was locally mirrored, so the usable disks were 7 4-GB disks, a relatively small application. Note that we configured HAGEO in its most secure mode of mwc (called mirror-write-consistency). This required us to create logical volumes (called state maps) to use with HAGEO, so we had to allocate two of these drives exclusively for HAGEO use.

The systems were connected to the network with two 10-Mb Ethernet adapters for the data mirroring (between the sites) and one 10-Mb Ethernet adapter for the client traffic (where users were going to be logging in).

For ease of debugging problems, we chose the following schedule:

1. Setup systems (local and remote sites) and get operating system working.
2. Install application and Sybase database on one site.
3. Verify that application works correctly.
4. Install HACMP/HAGEO.
5. Initiate HACMP/HAGEO mirroring capabilities.
6. Verify that application still works correctly and that the data is mirrored to remote location.

Initial Application Installation (Before HACMP/HAGEO)

During the initial application installation (Steps 1-3 above), the application AIX logical volumes and filesystems were created and the application software installed on the primary system. This was done without HACMP/HAGEO, so the application installation processes were typical of any IBM AIX system with that application. The application was a custom labor management system using a Sybase database.

Up until this point, the overall cluster looked like Figure 1. In this diagram, the two systems are installed and have all network connections, but the application is only installed and running on one system. The application and data will eventually be mirrored to the remote system, so it is not necessary to install it on both systems.

Note that the AIX logical volumes (LVs) have been defined to contain the application and data. Sybase knows about them and reads and writes to the LVs directly. These LVs are defined as:

/dev/appslv00 -- Where the Sybase DBMS and application files are located
/dev/sybaselv01 -- Sybase database data
/dev/sybaselv02 -- Sybase database data
/dev/sybaselv03 -- Sybase database data
/dev/sybaselv04 -- Sybase database data

There are two Ethernet adapters for the user or client network. One of these adapters is the HACMP "service" adapter to which the clients connect. The other adapter is the HACMP "standby" adapter used by HACMP as a backup adapter if the primary adapter fails.

There are two Ethernet adapters for the HAGEO mirroring function. HAGEO will load balance between the two adapters, so there is no need for a standby adapter in this network.

In the following discussion, I will give the actual commands that we used to configure HACMP and lastly, how we configured and started the HAGEO mirroring.

Installing HACMP and HAGEO

Installing HACMP and HAGEO (Step 4 above) is a simple process of loading the CDs on the system and running SMIT to install the base HACMP and HAGEO packages. To do this on the command line, run the following commands:

To install HACMP (all on one line):

/usr/lib/instl/sm_inst installp_cmd -a -Q -d '/dev/cd0' -f 'cluster.base
ALL  @@cluster.base _all_filesets'  '-c' '-N' '-g' '-X'   '-G'

To install HAGEO (all on one line):

/usr/lib/instl/sm_inst installp_cmd -a -Q -d '.' -f 'hageo.man.en_US
ALL  @@hageo.man.en_US _all_filesets,hageo.manage  ALL  @@hageo.manage
_all_filesets,hageo.message ALL  @@hageo.message _all_filesets,hageo.mirror ALL
@@hageo.mirror _all_filesets'  '-c' '-N' '-g' '-X'   '-G'

Configuring HACMP

Configuring HACMP and HAGEO (Step 5 above) requires that you enter all of the cluster information to HACMP and HAGEO. Most of this information is shown in Figures 1 and 2. In this particular case, the information needed to configure HACMP and HAGEO is:

Create the HACMP cluster.
Add the node names and IP addresses.
Create the HACMP resource group.
HAGEO geo-mirror device information.
Activate the mirroring capabilities.

To configure the HACMP cluster:

/usr/sbin/cluster/utilities/claddclstr -i'1' -n'hageo1'

To configure the HACMP nodes and IP addresses:

/usr/sbin/cluster/utilities/clnodename -a 'labor1'
/usr/sbin/cluster/utilities/clnodename -a 'labor2'

When configuring the IP addresses, note that there are two addresses associated with the primary adapter on each system (en0). One of these addresses is the "service" address or the address to which the clients connect. The second address is a "boot" address used only when the system boots up. This is because the "service" address will move from the primary system (labor1) to the standby system (labor2) during a primary system failure. To bring up "labor1" when this happens, we must have an address for the system, so it does not conflict with the "service" address that just moved over to "labor2". This is the "boot" address.

Figures 1 and 2 do not show the interface names that resolve to the IP addresses. These are as follows:

10.25.172.32    labor1svc
10.25.172.31    labor1stby
10.25.182.32    labor1boot
10.25.172.42    labor2svc
10.25.172.41    labor2stby
10.25.182.42    labor2boot

To configure these IP addresses in HACMP:

/usr/sbin/cluster/utilities/claddnode -a'labor1svc' :'ether' :'ether1' \
  :'public' :'service' :'10.25.172.32' :'0004ac5711aa' -n'labor1'

/usr/sbin/cluster/utilities/claddnode -a'labor1stby' :'ether' :'ether1' \
  :'public' :'standby' :'10.25.172.31' : -n'labor1'

/usr/sbin/cluster/utilities/claddnode -a'labor1boot' :'ether' :'ether1' \
  :'public' :'boot' :'10.25.182.32' : -n'labor1'

/usr/sbin/cluster/utilities/claddnode -a'labor1svc' :'ether' :'ether1' \
  :'public' :'service' :'10.25.172.42' :'0004ac5711bb' -n'labor1'

/usr/sbin/cluster/utilities/claddnode -a'labor1stby' :'ether' :'ether1' \
  :'public' :'standby' :'10.25.172.41' : -n'labor1'

/usr/sbin/cluster/utilities/claddnode -a'labor1boot' :'ether' :'ether1' \
  :'public' :'boot' :'10.25.182.42' : -n'labor1'

Creating the HACMP Resource Groups

The application resource information comprises all the system resources that HACMP uses to manage an application. In other words, this is everything needed to start or stop the application. In our case, we will need the AIX logical volumes (and AIX volume group) associated with the application, the IP address the users log into, and a script to start and stop the application.

We created three resource groups, one for "labor1", one for "labor2", and one for moving the application from site "losangeles" to site "orange".

Here are the commands we used to configure HACMP. To create the HACMP "resource groups":

/usr/sbin/cluster/utilities/claddgrp -g 'resourcegroup1' -r 'cascading' \ -n 'losangeles orange'

/usr/sbin/cluster/utilities/claddgrp -g 'resourcegroup2' -r 'cascading' \ -n 'labor1'

/usr/sbin/cluster/utilities/claddgrp -g 'resourcegroup3' -r 'cascading' \ -n 'labor2'

To create the HACMP "application server" that contains the start and stop script information:

/usr/sbin/cluster/utilities/claddserv -s'appserver1' \
  -b'/apps/hacmp/start_script' -e'/apps/hacmp/stop_script'

To put all of the system resources in the HACMP resource group:

/usr/sbin/cluster/utilities/claddres -g'resourcegroup1' SERVICE_LABEL= FILESYSTEM= 
FSCHECK_TOOL='fsck' RECOVERY_METHOD='sequential' EXPORT_FILESYSTEM= MOUNT_FILESYSTEM= 
VOLUME_GROUP='appsvg' CONCURRENT_VOLUME_GROUP= DISK= AIX_CONNECTIONS_SERVICES= 
AIX_FAST_CONNECT_SERVICES= APPLICATIONS='ktazp269ORACLE' SNA_CONNECTIONS= MISC_DATA= 
INACTIVE_TAKEOVER='false' DISK_FENCING='false' SSA_DISK_FENCING='false' FS_BEFORE_IPADDR='false'

/usr/sbin/cluster/utilities/claddres -g'resourcegroup2' SERVICE_LABEL='labor1svc' FILESYSTEM= 
FSCHECK_TOOL='fsck' RECOVERY_METHOD='sequential' EXPORT_FILESYSTEM= MOUNT_FILESYSTEM= 
VOLUME_GROUP= CONCURRENT_VOLUME_GROUP= DISK= AIX_CONNECTIONS_SERVICES= 
AIX_FAST_CONNECT_SERVICES= APPLICATIONS= SNA_CONNECTIONS= MISC_DATA= INACTIVE_TAKEOVER='false' 
DISK_FENCING='false' SSA_DISK_FENCING='false' FS_BEFORE_IPADDR='false'

/usr/sbin/cluster/utilities/claddres -g'resourcegroup3' SERVICE_LABEL= FILESYSTEM= 
FSCHECK_TOOL='fsck' RECOVERY_METHOD='sequential' EXPORT_FILESYSTEM= MOUNT_FILESYSTEM= 
VOLUME_GROUP='appsvg' CONCURRENT_VOLUME_GROUP= DISK= AIX_CONNECTIONS_SERVICES= 
AIX_FAST_CONNECT_SERVICES= APPLICATIONS= SNA_CONNECTIONS= MISC_DATA= INACTIVE_TAKEOVER='false' 
DISK_FENCING='false' SSA_DISK_FENCING='false' FS_BEFORE_IPADDR='false'

After configuring HACMP, it's time to configure HAGEO.

Configuring HAGEO

The HAGEO configuration involves some basic steps:

Import the HACMP configuration.
Start GEOmessage.
Configure the actual mirroring device drivers.
Start the mirroring process.

To import the HACMP configuration:

/usr/sbin/krpc/krpc_migrate_hacmp

To start GEOmessage:

/usr/sbin/krpc/cfgkrpc -ci

Configuring HAGEO Mirroring Device Drivers

I mentioned previously that Sybase knows about the AIX logical volumes and reads and writes directly to them. To configure the HAGEO device drivers and have the application work without reconfiguration, I will rename the AIX logical volumes to something else and create the HAGEO device drivers with the names of the AIX logical volumes.

Look at Figure 2 and see that the AIX logical volumes were renamed using the following names:

/dev/appslv00 rename to /dev/appslv00_lv
/dev/sybaselv01 rename to /dev/sybaselv01_lv
/dev/sybaselv02 rename to /dev/sybaselv02_lv
/dev/sybaselv03 rename to /dev/sybaselv03_lv
/dev/sybaselv04 rename to /dev/sybaselv04_lv

We then created the GEOmirror devices (GMD's) with the previous AIX LV names (the ones Sybase is configured for):

/dev/appslv00
/dev/sybaselv01
/dev/sybaselv02
/dev/sybaselv03
/dev/sybaselv04

Here's the command to create a GMD "/dev/appslv" on "labor1":

mkdev -c geo_mirror -s gmd -t lgmd -l'appslv00' '-S' -a minor_num='1' 
-a state_map_dev='/dev/appslv_sm' -a local_device='/dev/rappslv00_lv' -a
device_mode='mwc' -a device_role='none' -a remote_device='labor2@/dev/rappslv00_lv'

where "appslv" is the GMD name, "/dev/appslv00_sm" is an AIX logical volume used as a log (called a "state map"), "/dev/rappslv00_lv" is the local AIX logical volume being mirrored and labor@/dev/rappslv00_lv is the remote AIX logical volume mirror target.

The other GMDs were created pointing to the appropriate logical volumes:

/dev/sybaselv01 (mirrored the local and remote /dev/sybaselv01_lv)
/dev/sybaselv02 (mirrored the local and remote /dev/sybaselv02_lv)
/dev/sybaselv03 (mirrored the local and remote /dev/sybaselv03_lv)
/dev/sybaselv04 (mirrored the local and remote /dev/sybaselv04_lv)

Almost Ready to Start HAGEO Mirroring

After configuring HACMP and the HAGEO devices, we are almost ready to start mirroring across the WAN. Before doing this, however, we need to tell HAGEO what the good copy is using the /usr/sbin/gmddirty and /usr/sbin/gmdclean commands. By marking one side as "dirty" and the other side as "clean", we are telling HAGEO that all of the data needs to be copied from one site to the other. HAGEO moves data from the "dirty" site to the "clean" site. To mark a site as "dirty" or the site to copy from (in this case "labor1"):

/usr/sbin/gmd/gmddirty -l appslv (and do this for all of the other GMD's)

To mark a site as "dirty" or the site to copy to (in this case "labor2"):

/usr/sbin/gmd/gmdclean -l appslv (and do this for all of the other GMD's)

Starting the HAGEO Mirroring

Now that we have configured HACMP/HAGEO, all we have to do is start HACMP. HACMP will then start the HAGEO GMDs and commence the mirroring.

Here's the command to start HACMP:

/usr/sbin/cluster/etc/rc.cluster -boot '-N' '-b'  '-i'

Final HAGEO Configuration

The outcome of all this is shown in Figure 2. If you look closely at the Sybase DBMS, it is writing to what it thinks are the AIX logical volumes. Sybase is actually sending the data to the HAGEO GMDs. The HAGEO GMDs are sending the data to the AIX logical volumes and to the remote site. This is remote mirroring using HAGEO.

Performance Issues

As far as the configuration and mirroring functionality was concerned, everything went smoothly. We ran into severe problems with performance. Again, the systems that we installed were several times faster (using multiple processors) than the previous system, so we did not expect performance problems in terms of the system. We did not know, however, the application-write characteristics, so we expected some performance issues but not severe ones.

We were surprised that many aspects of the application were single-threaded and did not utilize the multiple processors of the system. A good deal of work had to be done by the customer and application vendor to streamline the application, and in some cases change it, so it could use multiple processors. This work was unexpected and caused several months of delay in the implementation of the solution.

Another performance issue was the use of the state maps by the HAGEO product. The state maps are AIX logical volumes that are used to log changes to the application logical volumes. This allows HAGEO to keep track of changes to the logical volumes and synchronize only the changes. We had initially thought that scattering the state maps across drives would give us good performance. After some testing, it was discovered that better performance was achieved by placing those state maps together in a single drive (and without any application logical volumes). This change alone gave us a 20% improvement in the performance tests that we were using (which were specific to this application).

After the performance testing and changes, we were able to mirror data across the sites with performance levels satisfactory to the customer. The performance was actually only slightly better then their old systems but with more data integrity. Also, with the integration of HAGEO with IBM's HACMP product, when the primary system failed, the secondary system could automatically detect the error and be up and running with the customer application within 20 minutes.

This cluster has been in production for a couple of years now, and has worked well. Again, it has required good systems administration skills to maintain, but in the several cases where the primary system has failed (for hardware reasons mostly), the secondary system successfully took over the workload within the expected amount of time. This particular configuration cost the customer approximately $500K (not including application changes).

Summary

I have presented several methods of mirroring data between sites focusing on the real-time mirroring techniques. This is by no way a detailed or all-inclusive summary but describes the most common techniques for achieving this function.

If the database vendors could make the data replication capabilities of their products easier to implement, there would certainly be a lot more usage of their products. Hardware storage vendor solutions currently provide very good performance at short distances and probably have the best potential of all of the solutions in terms of functionality. They also are the most expensive.

I also discussed IBM's HAGEO software solution as implemented in a real-life client situation. This provided the functionality the customer was looking for along with the performance that was acceptable to his clients.

Primitivo Cervantes is an IT Specialist who has worked as a consultant for the last nine years. He has been in the computer/systems industry for fifteen years and has specialized in high-availability and disaster-recovery systems for the last seven years.