Questions
and Answers
Jim McKinstry and Amy Rich
Here's a problem and solution that was sent to me by Bryce
Nutter and Steve Harad of u1.net in Marlton, New Jersey. Please
keep these coming. Thanks -- Jim.
We are running Solaris 2.5.1 with the latest patches on a Sun
UE4500, with a Sun 5200 storage array, six 9-GB disks, and are running
Veritas 3.0.3 with DMP. DMP is Dynamic Multi-Pathing, which provides
host bus adapter failover, as well as load balancing across the
host bus adapters. We ran a test where we did a tar of a
large filesystem that was on the 5200 array. We pulled the fiber
connection from the storage array while the tar was running.
The time to "find and take" the alternate path was consistently
40-50 seconds; we tested this a half a dozen times. Our question
was: Why is DMP taking so long to failover to the alternate path?
We tried setting the DMP restore interval (vxdmpadm start restore
interval=10) to 10, 30, 60, and 500 seconds. These values had
no effect on the failover time. It was always 40-50 seconds. This
makes sense because this value defines the restore time. When we
plugged the fiber connection back in, it took interval seconds
to re-enable the path.
We worked with Sun, Veritas, and asked Jim McKinstry. The general
consensus is that it does take at least a minute for it to failover.
The reason is that the lost connection is not detected until the
I/O request times out. This can be a "long" time. The
error detection is passive (i.e., the DMP software does not actively
poll devices for errors. It waits for I/Os to timeout and then searches
for the failover device). Check the kernel driver for your FC card.
It should have a timeout parameter. Don't set it too low or
you will failover every time there is contention on the card/bus,
which would really degrade your progress.
The details, based on the message logs, are as follows: When the
cable is pulled, the ssd driver fails the I/O. This is normal
and the I/O is retried:
WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssd@w21000020376fd7a1,0 (ssd13):
SCSI transport failed: reason 'tran_err': retrying command
Almost 50 seconds later, the device drivers that maintain connectivity
with the array try to re-establish the link and finally give up. The
devices on that loop are "offlined":
WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0 (sf1):
Offlining AL-PA 0xe0. Reason: Marked (OFL_RSN)
At virtually the same time, the disk device driver identifies the
queued I/Os as failed:
WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssd@w21000020376fd7a1,0 (ssd13):
1381 Oct 17 11:23:10 depd52 unix: requeue of command fails (fffffffe)
1382
Also, at the same time, the disk device driver rejects the transport
request. At this point, the DMP driver does the failover. The failover
is almost instantaneous:
WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssd@w21000020376fd7a1,0 (ssd13):
transport rejected (-2)
NOTICE: vxvm:vxdmp: disabled path 118/0x68 belonging to the dmpnode 130/0x38
NOTICE: vxdmp: Path failure on 118/108^
Because our applications can not wait over 40 seconds for an I/O to
complete, it looks like we will not be able to use DMP for failover.
We will probably go with a split-loop configuration and mirror across
the loops. Unfortunately, we will be giving up the performance benefits
of the DMP load balancing.
Q I'm looking for a tool to
generate random pronounceable passwords so that it's easier
for users to remember them. Is there a tool out there that will
do this?
A The problem with pronounceable
passwords is that they're usually all alphabetic characters
(no numbers or symbols). This makes your passwords easier to crack,
but it's certainly better than having your users set their
passwords to dictionary words. A Java password generator (along
with source) can be found at http://www.best.com/~thvv/gpw.html.
This password generator is based on Morrie Gasser's password
generator.
Q How do I change the speed of my
Ethernet card on an Ultra 1?
A Let's assume, for example,
that you want to force full duplex (FDX) at 100 MB on hme0. Using
ndd will temporarily change settings till the next reboot,
and you can put lines into /etc/system to make it persist
through reboots.
First, choose hme0:
ndd -set /dev/hme instance 0
Set it to 100 MB, full duplex:
ndd -set /dev/hme adv_100fdx_cap 1
Next, make it so that the card does not auto-negotiate:
ndd -set /dev/hme adv_autoneg_cap 0
When we modify /etc/system, this will apply to all hme
interfaces:
set hme:hme_adv_100fdx_cap=1
set hme:hme_adv_100T4_cap=0
set hme:hme_adv_autoneg_cap=0
If each hme interface must be set separately, then you
must modify /kernel/drv/hme.conf. The settings in this file
override those in /etc/system.
First, use prtconf -v | less and find the section that
relates to the SUNW,hme instance that you wish to modify,
because the register values will need to be entered in the hme.conf
file; the output will differ if you're using different hardware:
SUNW,hme, instance #0
Driver software properties:
name <pm_norm_pwr> length <4>
value <0x00000001>.
name <pm_timestamp> length <4>
value <0x30743b26>.
Register Specifications:
Bus Type=0xe, Address=0x8c00000, Size=108
Bus Type=0xe, Address=0x8c02000, Size=2000
Bus Type=0xe, Address=0x8c04000, Size=2000
Bus Type=0xe, Address=0x8c06000, Size=2000
Bus Type=0xe, Address=0x8c07000, Size=20
Next, modify /kernel/drv/hme.conf:
name="hme" class="sbus"
reg=0xe,0x8c00000,0x00000108,0xe,0x8c02000,0x00002000,0xe,0x8c04000, \
0x00002000,0xe,0x8c06000,0x00002000,0xe,0x8c07000,0x00000020
adv_autoneg_cap=0 adv_100T4_cap=0 adv_100fdx_cap=1;
Q I'm looking for calendaring
software that will run on our UNIX systems. The other half of out
IT department is pushing for Exchange because we have a lot of Outlook
users, but we want to keep reliability and stability.
A Take a look at the calendar products
put out by Steltor (http://www.cst.ca/). A client-side DLL
makes your Outlook client think the calendar server, combined with
an IMAP server, is actually a Microsoft Exchange server. You can
also add Web clients, WAP editions, a wireless dataserver, and more.
There are also clients for platforms other than Windows. It's
handy software if you're trying to avoid Microsoft Exchange.
Q I'm trying to install a Solaris
8 machine, but I keep running into errors that tell me it can't
find a suitable boot disk. It keeps telling me that the "magic
number" is bad. The machine has two 9-GB disks in it, so I'm
not sure what the problem is. I've tried both from CD and with
Jumpstart, and the install always bombs out. What's the problem,
and how do I fix it?
A It sounds like your disks have
not been labeled. Jumpstart will leave you at a # prompt from which
you can invoke format, or you can boot single user from the
CD-ROM and use format. When you choose your disks in format,
it should ask you to label them. After you label the
disks, you can retry your installation via either method; they should
both work.
Q I'm running Debian Linux,
and my Ethernet card is not detected on boot, yet loading the module
after boot with insmod eepro100 works fine. How can I get
it to load at boot so all of this is automatic?
A Modify /etc/modules.conf
and add the following:
alias eth0 eepro100
Jim McKinstry is a Senior Sales Engineer for MTI Technology
Corporation (www.mti.com). MTI is a leading international
provider of data storage management products and services. He can
be reached at: jrmckins@yahoo.com.
Amy Rich, president of the Boston-based Oceanwave Consulting,
Inc. (http://www.oceanwave.com), has been a UNIX systems
administrator for more than five years. She received a BSCS at Worcester
Polytechnic Institute, and can be reached at: arr@oceanwave.com.
|