Questions and Answers

Jim McKinstry and Amy Rich

Here's a problem and solution that was sent to me by Bryce Nutter and Steve Harad of u1.net in Marlton, New Jersey. Please keep these coming. Thanks -- Jim.

We are running Solaris 2.5.1 with the latest patches on a Sun UE4500, with a Sun 5200 storage array, six 9-GB disks, and are running Veritas 3.0.3 with DMP. DMP is Dynamic Multi-Pathing, which provides host bus adapter failover, as well as load balancing across the host bus adapters. We ran a test where we did a tar of a large filesystem that was on the 5200 array. We pulled the fiber connection from the storage array while the tar was running. The time to "find and take" the alternate path was consistently 40-50 seconds; we tested this a half a dozen times. Our question was: Why is DMP taking so long to failover to the alternate path?

We tried setting the DMP restore interval (vxdmpadm start restore interval=10) to 10, 30, 60, and 500 seconds. These values had no effect on the failover time. It was always 40-50 seconds. This makes sense because this value defines the restore time. When we plugged the fiber connection back in, it took interval seconds to re-enable the path.

We worked with Sun, Veritas, and asked Jim McKinstry. The general consensus is that it does take at least a minute for it to failover. The reason is that the lost connection is not detected until the I/O request times out. This can be a "long" time. The error detection is passive (i.e., the DMP software does not actively poll devices for errors. It waits for I/Os to timeout and then searches for the failover device). Check the kernel driver for your FC card. It should have a timeout parameter. Don't set it too low or you will failover every time there is contention on the card/bus, which would really degrade your progress.

The details, based on the message logs, are as follows: When the cable is pulled, the ssd driver fails the I/O. This is normal and the I/O is retried:

    WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssd@w21000020376fd7a1,0 (ssd13):
     SCSI transport failed: reason 'tran_err': retrying command

Almost 50 seconds later, the device drivers that maintain connectivity with the array try to re-establish the link and finally give up. The devices on that loop are "offlined":

    WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0 (sf1):
    Offlining AL-PA 0xe0. Reason: Marked (OFL_RSN)

At virtually the same time, the disk device driver identifies the queued I/Os as failed:

    WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssd@w21000020376fd7a1,0 (ssd13):
    1381 Oct 17 11:23:10 depd52 unix: requeue of command fails (fffffffe)
    1382

Also, at the same time, the disk device driver rejects the transport request. At this point, the DMP driver does the failover. The failover is almost instantaneous:

    WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssd@w21000020376fd7a1,0 (ssd13):
    transport rejected (-2)
    NOTICE: vxvm:vxdmp: disabled path 118/0x68 belonging to the dmpnode 130/0x38
    NOTICE: vxdmp: Path failure on 118/108^

Because our applications can not wait over 40 seconds for an I/O to complete, it looks like we will not be able to use DMP for failover. We will probably go with a split-loop configuration and mirror across the loops. Unfortunately, we will be giving up the performance benefits of the DMP load balancing.

Q I'm looking for a tool to generate random pronounceable passwords so that it's easier for users to remember them. Is there a tool out there that will do this?

A The problem with pronounceable passwords is that they're usually all alphabetic characters (no numbers or symbols). This makes your passwords easier to crack, but it's certainly better than having your users set their passwords to dictionary words. A Java password generator (along with source) can be found at http://www.best.com/~thvv/gpw.html. This password generator is based on Morrie Gasser's password generator.

Q How do I change the speed of my Ethernet card on an Ultra 1?

A Let's assume, for example, that you want to force full duplex (FDX) at 100 MB on hme0. Using ndd will temporarily change settings till the next reboot, and you can put lines into /etc/system to make it persist through reboots.

First, choose hme0:

ndd -set /dev/hme instance 0

Set it to 100 MB, full duplex:

ndd -set /dev/hme adv_100fdx_cap 1

Next, make it so that the card does not auto-negotiate:

ndd -set /dev/hme adv_autoneg_cap 0

When we modify /etc/system, this will apply to all hme interfaces:

set hme:hme_adv_100fdx_cap=1
set hme:hme_adv_100T4_cap=0
set hme:hme_adv_autoneg_cap=0

If each hme interface must be set separately, then you must modify /kernel/drv/hme.conf. The settings in this file override those in /etc/system.

First, use prtconf -v | less and find the section that relates to the SUNW,hme instance that you wish to modify, because the register values will need to be entered in the hme.conf file; the output will differ if you're using different hardware:

SUNW,hme, instance #0
    Driver software properties:
        name <pm_norm_pwr> length <4>
            value <0x00000001>.
        name <pm_timestamp> length <4>
            value <0x30743b26>.
    Register Specifications:
        Bus Type=0xe, Address=0x8c00000, Size=108
        Bus Type=0xe, Address=0x8c02000, Size=2000
        Bus Type=0xe, Address=0x8c04000, Size=2000
        Bus Type=0xe, Address=0x8c06000, Size=2000
        Bus Type=0xe, Address=0x8c07000, Size=20

Next, modify /kernel/drv/hme.conf:

name="hme" class="sbus"
reg=0xe,0x8c00000,0x00000108,0xe,0x8c02000,0x00002000,0xe,0x8c04000, \
0x00002000,0xe,0x8c06000,0x00002000,0xe,0x8c07000,0x00000020
adv_autoneg_cap=0 adv_100T4_cap=0 adv_100fdx_cap=1;

Q I'm looking for calendaring software that will run on our UNIX systems. The other half of out IT department is pushing for Exchange because we have a lot of Outlook users, but we want to keep reliability and stability.

A Take a look at the calendar products put out by Steltor (http://www.cst.ca/). A client-side DLL makes your Outlook client think the calendar server, combined with an IMAP server, is actually a Microsoft Exchange server. You can also add Web clients, WAP editions, a wireless dataserver, and more. There are also clients for platforms other than Windows. It's handy software if you're trying to avoid Microsoft Exchange.

Q I'm trying to install a Solaris 8 machine, but I keep running into errors that tell me it can't find a suitable boot disk. It keeps telling me that the "magic number" is bad. The machine has two 9-GB disks in it, so I'm not sure what the problem is. I've tried both from CD and with Jumpstart, and the install always bombs out. What's the problem, and how do I fix it?

A It sounds like your disks have not been labeled. Jumpstart will leave you at a # prompt from which you can invoke format, or you can boot single user from the CD-ROM and use format. When you choose your disks in format, it should ask you to label them. After you label the disks, you can retry your installation via either method; they should both work.

Q I'm running Debian Linux, and my Ethernet card is not detected on boot, yet loading the module after boot with insmod eepro100 works fine. How can I get it to load at boot so all of this is automatic?

A Modify /etc/modules.conf and add the following:

alias eth0 eepro100

Jim McKinstry is a Senior Sales Engineer for MTI Technology Corporation (www.mti.com). MTI is a leading international provider of data storage management products and services. He can be reached at: jrmckins@yahoo.com.

Amy Rich, president of the Boston-based Oceanwave Consulting, Inc. (http://www.oceanwave.com), has been a UNIX systems administrator for more than five years. She received a BSCS at Worcester Polytechnic Institute, and can be reached at: arr@oceanwave.com.