Questions
and Answers
Amy Rich
Q Our Ultra 60 running Solaris 8
keeps crashing on us, and I want to do some crash dump analysis.
Unfortunately, I don't really know how to go about getting
the crash dump, or how to read the dump once I have it. Could you
offer some advice?
A I highly recommend the book Panic!
Unix System Crash Dump Analysis, by Chris Drake and Kimberley
Brown (Prentice Hall, 1995, ISBN 0131493868). I'm not sure
if both of the authors are still with Sun, but they were when they
wrote the book. Although this book is a bit dated, and I really
wish they would put out a second edition for the Ultra SPARC family,
this book is still the bible for analyzing UNIX system crashes.
It covers:
- The difference between panics and hangs
- Header files, symbols, and symbol tables
- A tutorial on how to use adb
- The stack and stack traces
- An introduction to assembly
- An overview of UNIX internals
- The SPARC processor and instruction set
Here is some basic information to get you started. Make sure that
you have savecore enabled, or you're not going to get
a crash dump. In Solaris 8, savecore is enabled by default,
but if it has been disabled for some reason, you can re-enable it
by running:
/usr/sbin/dumpadm -y
This modifies /etc/dumpadm.comf so that savecore automatically
runs for every reboot. Also make sure that /etc/rc2.d/S75savecore
exists and is intact. If you look at /etc/dumpadm.conf, you'll
see that it also stores the directory location of the crash dumps,
usually /var/crash/'/bin/uname -n'/. If you have changed machine
names, you'll need to change dumpadm.conf to reflect the
new machine name. Also be sure that you have a valid disk slice (like
/dev/dsk/c0t0d0s1) listed after DUMPADM_DEVICE=, and
not part of a DiskSuite or Veritas mirror. This slice should not be
smaller than the amount of physical memory you have, or you'll
run out of space when writing out the savecore file.
When the machine crashes again (/usr/bin/who -b will tell
you when it came back up), you should have a core file to dissect
located in /var/crash/'/bin/uname -n'/vmcore.N, where N is
a number. The default namelist for your core dump will be /var/crash/'/bin/uname
-n'/unix.N, where N will correspond to that of the vmcore file.
Let's say that you've just crashed the machine and your
files are /var/crash/myhost.my.com/vmcore.1 and /var/crash/myhost.my.com/unix.1.
You can get some basic information by using /usr/sbin/crash.
Starting with Solaris 8, you should transition from using /usr/sbin/crash
to using /usr/bin/mdb, the modular debugger (contained in
packages SUNWmdb and SUNWmdbx). Since most people are familiar with
/usr/sbin/crash, though, I'll cover that instead.
Load the core file and the associated namelist:
/usr/sbin/crash -d /var/crash/myhost.my.com/vmcore.1 -n /var/crash/myhost.my.com/unix.1
Now you can do things like print out all entries in the process table:
proc -e
Print out information about open special filenames:
snode -e -l
Print out the tunable parameters:
var
For more information on the capabilities of /usr/bin/crash,
see the man page.
You can also use /usr/bin/ipcs to look at the inter-process
communication status of things when the core was taken:
/usr/sbin/ipcs -C /var/crash/myhost.my.com/vmcore.1 -N /var/crash/myhost.my.com/unix.1
Q We were looking for a cheap backup
solution for our small office and we've decided to use Amanda.
Our dump host is a pokey old SPARC 20 running Solaris 8 with a Spectra
Logic Treefrog attached to it. I've installed the software on
the server and on the clients, but I'm having issues dealing
with the tape drive. I am looking for a changer script that would
play well with the Treefrog. Do you know of anything that might work?
A First, make sure the Treefrog
configuration switch on the back of the unit is set to 9 so that
it interfaces with Solaris. Second, make sure you enable the sgen
driver and have it create a device for the changer in /dev/scsi/changer.
To create the entry, add the following line to /kernel/drv/sgen.conf:
device-type-config-list="changer";
You'll also need to add lines to /kernel/drv/sgen.conf
that tell it to search each device for the changer. You can add just
one line for the location of your changer, or you can have it search
an entire LUN if you think you might move the changer on the SCSI
chain. Here's what the syntax would look like for searching all
of LUN 0:
name="sgen" class="scsi" target=0 lun=0;
name="sgen" class="scsi" target=1 lun=0;
name="sgen" class="scsi" target=2 lun=0;
name="sgen" class="scsi" target=3 lun=0;
name="sgen" class="scsi" target=4 lun=0;
name="sgen" class="scsi" target=5 lun=0;
name="sgen" class="scsi" target=6 lun=0;
name="sgen" class="scsi" target=7 lun=0;
name="sgen" class="scsi" target=8 lun=0;
name="sgen" class="scsi" target=9 lun=0;
name="sgen" class="scsi" target=10 lun=0;
name="sgen" class="scsi" target=11 lun=0;
name="sgen" class="scsi" target=12 lun=0;
name="sgen" class="scsi" target=13 lun=0;
name="sgen" class="scsi" target=14 lun=0;
name="sgen" class="scsi" target=15 lun=0;
Once you've added the lines to /kernel/drv/sgen.conf,
halt the machine, attach the tape drive and power it on, then reboot
the machine. If you don't see a device entry in /dev/scsi/changer/,
try running /usr/sbin/devfsadm to scan for new devices.
Once the library is properly configured, install mtx (http://mtx.sourceforge.net/)
to talk to it. Once you're able to control the library with
mtx, you can hook it into Amanda.
You could probably use the chg-mtx or chg-scsi scripts
that come with Amanda, but I found the following changer script
while searching the amanda-users mail archive (http://www.amanda.org/).
chg-spectra was written by Stephen Carville specifically
for controlling the Treefrog with mtx. The GNU GPL that the software
references is available at http://www.gnu.org/copyleft/gpl.html.
I've tested this changer script, and it seems to work well.
You may need to change a couple of paths, depending on where you
installed Amanda, but this should work pretty much right out of
the box:
#!/usr/bin/perl -w
#
# Tape changer glue script for a Spectra 2000 tape changer
# Does not include the 'clean' option
#
# Version 1.0
# Copyright (C) 2001 Stephen Carville
# Unix and Network Administrator
# Ace Flood USA
# stephen@totalflood.com
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to:
# Free Software Foundation, Inc.
# 59 Temple Place, Suite 330,
# Boston, MA 02111-1307 USA
use strict;
# use DB_File;
# paths to amanda directories
my $PREFIX = "/usr/local";
my $SBIN = "$PREFIX/sbin";
my $BIN = "$PREFIX/bin";
my $LIBEXEC = "$PREFIX/libexec";
# debugging files -- set DEBUGDIR to "" to disable debugging output
my $DEBUGDIR = "/usr/local/etc/amanda/tmp";
my $DBGFILE = "changer.debug";
# set a conservative path. All commands are full paths so an empty path
# would be OK
my $PATH = "/bin:/sbin";
# commands -- change these to suit your configuration
my $MT = "/usr/bin/mt";
my $MTX = "/usr/local/sbin/mtx";
my $AMGETCONF = "$SBIN/amgetconf";
my $MAILER = "/usr/bin/mailx";
# ----------- end of user configuration --------------
# exit codes
my ($SUCCESS,$ERROR,$BADERROR) = (0,1,2);
my ($cmd,$parm,$line,$rtn,$status);
my ($tapedev,$changerdev,$changerfile,$mailto);
my ($cleanfile,$accessfile,$slotfile,$labelfile,$dbgfile);
# careful! - global!
my ($firstslot,$lastslot,$cleanslot,$havereader);
if (defined $ARGV[0]) {$cmd = $ARGV[0];} else {$cmd="";};
if (defined $ARGV[1]) {$parm = $ARGV[1];} else {$parm = "";};
# get rid of any leading '-' in the command
$cmd =~ s/-//;
# print "$cmd,$parm\n";
# set the execution path
$ENV{PATH} = $PATH;
# get some values from the configuration file -- careful these are
# global
$tapedev = '$AMGETCONF tapedev'; $tapedev =~ s/\n//;
$changerdev = '$AMGETCONF changerdev'; $changerdev =~ s/\n//;
$changerfile = '$AMGETCONF changerfile'; $changerfile =~ s/\n//;
$mailto = '$AMGETCONF mailto'; $mailto =~ s/\n//;
# mtx doesn't always need the changer device but it doesn't hurt
$MTX .= " -f $changerdev";
# OTOH, mt usually does need the tape device
$MT .= " -f $tapedev";
# set the debug file
if ( -d $DEBUGDIR) {
$dbgfile = "$DEBUGDIR/$DBGFILE";
} else {
$dbgfile = "/dev/null";
}
# this makes it easier to tell what happened
debugmsg ("==========");
# print "$tapedev,$changerdev,$changerfile\n";
# I only use barcodes and conf but I include the rest JIC.
$cleanfile = "$changerfile-clean";
$accessfile = "$changerfile-access";
$slotfile = "$changerfile-slot";
$labelfile = "$changerfile-barcodes";
$changerfile .= ".conf";
# get values from the changerfile
open FILE, $changerfile or die "$BADERROR: Unable to open $changerfile";
foreach $line (<FILE>) {
$line =~ s/\n//;
$line =~ s/\s+//g;
if ($line =~ m/^\#/ || $line eq "") {
next;
}
$line =~ m/(\w+)=(\d+)/;
# no tricky way to do this?
if ($1 eq "firstslot") {$firstslot = $2; next; }
if ($1 eq "lastslot") {$lastslot = $2; next; }
if ($1 eq "cleanslot") {$cleanslot = $2; next; }
if ($1 eq "havereader") {$havereader = $2; next; }
}
# print "$firstslot,$lastslot,$cleanslot,$havereader\n";
# branch to the appropiate subroutine
# how are we callled? slot, info, reset, eject, label, search, clean
if ($cmd eq "slot") {
($rtn,$status) = slot ($parm);
print "$status\n";
exit $rtn;
}
if ($cmd eq "info") {
($rtn,$status) = info ($parm);
print "$status\n";
exit $rtn;
}
if ($cmd eq "reset") {
($rtn,$status) = rset ($parm);
print "$status\n";
exit $rtn;
}
if ($cmd eq "eject") {
($rtn,$status) = eject ($parm);
print "$status\n";
exit $rtn;
}
if ($cmd eq "label") {
($rtn,$status) = label ($parm);
print "$status\n";
exit $rtn;
}
if ($cmd eq "search") {
($rtn,$status) = search ($parm);
print "$status\n";
exit $rtn;
}
if ($cmd eq "clean") {
($rtn,$status) = clean ($parm);
print "$status\n";
exit $rtn;
}
# if get this far cmd is no good
print "Unkown option: $cmd\n";
exit $BADERROR;
# command subroutines
# move a tape from a slot to the tape drive
sub slot {
my ($parm) = @_;
my ($usedslot,$barcode,$loadslot);
($usedslot,$barcode) = status ();
$loadslot = -1;
debugmsg ("SLOT->load tape from slot $parm");
# what were we asked to do?
# current -- return current loaded slot or load first slot
if ($parm eq "current") {
if ($usedslot < $firstslot) {
$loadslot = $firstslot;
} else {
return $SUCCESS,"$usedslot $tapedev";
}
return load ($loadslot);
}
# next or advance -- load the next tape or first slot
if ($parm eq "next" or $parm eq "advance") {
unload ($usedslot);
$loadslot = $usedslot+1;
if ($loadslot > $lastslot) {
$loadslot = $firstslot;
}
return load ($loadslot);
}
# prev -- load the previous or last slot
if ($parm eq "prev") {
unload ($usedslot);
$loadslot = $usedslot-1;
if ($loadslot < $firstslot) {
$loadslot = $lastslot;
}
return load ($loadslot);
}
# first -- load the first slot
if ($parm eq "first") {
unload ($usedslot);
return load ($firstslot);
}
# last -- load the last slot
if ($parm eq "last") {
unload ($usedslot);
return load ($lastslot);
}
# clean -- I do not support cleaning. I prefer to set it up
# myself
if ($parm eq "clean") {
return clean($parm);
}
# check for a legitimate number
unless ($parm =~ m/\d+/) {
debugmsg ("SLOT->bad slot ID $parm");
return $ERROR,"0 Slot $parm is out of range ($firstslot -- $lastslot";
}
if ($parm >= $firstslot && $parm <= $lastslot) {
if ($parm == $usedslot) {
return $SUCCESS,"$usedslot $tapedev";
}
unload ($usedslot);
return load ($parm);
} else {
return $BADERROR, "0 Slot $parm is out of range ($firstslot -- $lastslot)\n";
}
return $BADERROR,"0 Unknown error!\n";
}
# return some information about capabilities
sub info {
my ($parm) = @_;
my ($usedslot,$barcode,$str);
($usedslot,$barcode) = status();
debugmsg("INFO -> current slot $usedslot, last slot $lastslot");
if ($usedslot < 0) {
$usedslot = 0;
}
$str = "$usedslot $lastslot 1";
# someday check the tape library for havereader
if ($havereader) {
$str .= " 1";
}
debugmsg ("INFO -> $str");
return $SUCCESS,"$str";
}
# reset the tape library to a known state
sub rset {
my ($parm) = @_;
my ($usedslot,$barcode);
($usedslot,$barcode) = status();
debugmsg ("RESET -> loading tape from slot $firstslot to $tapedev");
# first eject any tape already loaded
unload ($usedslot);
return load ($firstslot);
}
# eject the current tape
sub eject {
my ($usedslot,$barcode);
($usedslot,$barcode) = status();
debugmsg ("EJECT->unloading slot $usedslot");
return unload ($usedslot);
}
# add a label to the barcodes file
sub label {
my ($label) = @_;
my ($usedslot,$barcode);
my (%barcodes);
($usedslot,$barcode) = status();
%barcodes = load_labelfile();
debugmsg("Adding Barcode $barcode and amlabel $label for Slot $usedslot \
into $labelfile");
# if the label exists and is synced
if (exists $barcodes{$label} && $barcode eq $barcodes{$label} ) {
debugmsg("Barcode $barcode already synced for $label");
} else {
# any other condition, add/overwrite it and save the new file
$barcodes{$label} = $barcode;
save_labelfile(%barcodes);
}
return $SUCCESS, "0 $usedslot $tapedev";
}
# search for a tape in barcodes and load it into the drive
sub search {
my ($label) = @_;
my (%labels,$slot,$err);
debugmsg("SEARCH->searching for tape $label");
%labels = load_labelfile();
# if the label is in the file
if (exists $labels{$label}) {
$slot = findbybarcode($labels{$label});
} else {
debugmsg("SEARCH->tape $label not found in barcode database");
errormsg("Error with Barcode reader","Tape $label not found in barcode database");
return $BADERROR,"tape $label not found in barcode database";
}
# if the tape was not found in the changer
if ($slot < 0) {
debugmsg("SEARCH->tape $label not found in changer");
errormsg("Error with Barcode reader","Tape $label not found in changer");
return $BADERROR,"tape $label not found in changer";
}
# tape is already loaded
if ($slot == 0) {
return $SUCCESS,"$tapedev";
}
# otherwise load the tape
debugmsg("SEARCH->tape $label found at slot $slot");
sleep (10);
load($slot);
return $SUCCESS,"$tapedev";
}
# cleaning not supported yet
sub clean {
return $ERROR,"0 Cleaning not supported by driver\n";
}
# unload a tape from the drive and replace it in its slot
sub unload {
my ($slot) = @_;
my (@result);
# we know that -1 is empty
if ($slot < 0) {
debugmsg ("UNLOAD->$tapedev already empty");
return $ERROR, "0 Drive was not loaded";
}
@result = '$MTX unload $slot 2>&1';
if ($?) {
debugmsg ("UNLOAD->error unloading slot $slot");
return $BADERROR,"0 @result";
}
debugmsg ("UNLOAD->unloaded tape to slot $slot");
return $SUCCESS,"$slot $tapedev";
}
# load a tape into the drive
sub load {
my ($slot) = @_;
my (@result,$cntr,$off);
debugmsg ("LOAD->loading tape from slot $slot");
@result = '$MTX load $slot 2>&1';
if ($?) {
debugmsg ("LOAD->result == @result");
debugmsg ("LOAD->error in loading tape from slot $slot");
return $BADERROR,"$slot @result";
}
# wait for drive to go online but not forever
$cntr = 0;
$off = "";
while ($off eq "") {
@result = '$MT status 2>&1';
$off = grep (/offline/,@result);
$cntr++;
# is this the last try?
if ($cntr > 10) {
debugmsg ("LOAD->still offline at try # $cntr");
return $BADERROR,"$slot @result";
}
sleep(10);
}
# now rewind the tape
@result = '$MT rewind 2>&1';
sleep (1);
# phew!
debugmsg ("LOAD->$slot $tapedev");
return $SUCCESS,"$slot $tapedev";
}
#
# load the label database this is a space separated text file
# but I return a hash to make manipulation easier. This should use
# DB_File but not all systems have it installed so I follow the
# KISS principle. returns %hash{$label}=$barcode
sub load_labelfile {
my (%labels,$line,$lbl,$bc);
unless (-f $labelfile) {
return %labels;
}
open LABEL, "$labelfile";
foreach $line (<LABEL>) {
$line =~ s/\n//;
($lbl,$bc) = split /\s+/, $line;
if (defined $lbl and defined $bc) {
$labels{$lbl} = $bc;
}
}
close LABEL;
return %labels;
}
# save the labelfile -- this completely recreates the file!
sub save_labelfile {
my (%labels,$lbl) = @_;
# clobber the existing file
open LABEL, ">$labelfile";
foreach $lbl (keys %labels) {
print LABEL "$lbl $labels{$lbl}\n";
}
close LABEL;
}
# return the slot for a particular barcode
sub findbybarcode {
my ($bc) = @_;
my (@result,$line,$slot);
@result = '$MTX status 2>&1';
foreach $line (@result) {
if ($line =~ /[\w\s]+ Element (0):Full \([\w\s\d]+\):VolumeTag = (.+)/) {
if ($bc eq $2) {
debugmsg("FINDBYBARCODE->Label $bc is at slot $1");
return $1;
}
}
if ($line =~ /\w+\s+Element (\d+):Full :VolumeTag=(.+)/) {
if ($bc eq $2) {
debugmsg("FINDBYBARCODE->Label $bc is at slot $1");
return $1;
}
}
}
debugmsg("FINDBYBARCODE->Label $bc not found!");
return -1;
}
# return the slot and barcode of the loaded tape
sub status {
my (@result,$line,$stat,$el,$bc);
# get the changer status
@result = '$MTX status 2>&1';
# find the line with the tape drive in it
foreach $line (@result) {
if ($line =~ m/Data Transfer Element 0:(.+)/) {
# preserve the slot number and barcode
$stat = $1;
last;
}
}
#split the slot number from the barcode
($el,$bc) = split /:/,$stat;
# get the loaded element
if ($el =~ m/Full \(Storage Element (\d+) Loaded/) {
$el = $1;
} else {
$el = -1;
}
if (defined $bc) {
# get the loaded barcode
if ($bc =~ m/VolumeTag = (.+)/) {
$bc = $1;
} else {
$bc = -1;
}
} else {
$bc = -1;
}
# print "$el,$bc\n";
return $el,$bc;
}
# add a debugging messsage to the debug file
sub debugmsg {
my ($message) = @_;
open DBG, ">>$dbgfile";
print DBG $message;
print DBG "\n";
close DBG;
}
# send an error message to mailto
sub errormsg {
my ($subject,$message) = @_;
open PIPE, "| $MAILER -r \"$mailto\" -s \"$subject\" $mailto";
print PIPE $message;
close PIPE;
}
Set up Amanda on Solaris 8 with spectra logic tape drive and include
changer source.
Q I have a Sun IPX that was running
Solaris 7 and getting pretty crufty, so I reinstalled it from the
CD. I put on the full+OEM distribution. This machine was booting
fine before I reinstalled it, but now I get the following error
when I try to boot from the disk:
Boot device: /sbus/esp@0,800000/sd@3,0 File and args:
boot: cannot find misc/krtld
boot: error loading interpreter (misc/krtld)
Elf32 read error.
boot failed
Enter filename [/platform/SUNW,Sun_4_50/kernel/unix]:
Post doesn't show any hardware issues, and I've tried swapping
out various bits of the machine to make sure I wasn't missing
something. I also tried reinstalling again, paying closer attention
to check for any errors. Since the machine will boot fine off the
CDROM, I also tried swapping in a different disk and installing onto
that. Nothing seems to actually be wrong with the machine, and yet
I can't get it to boot off the disk.
A Your IPX is one of the old 32-bit
sun4c systems. Based on the information you've given me, it
sounds like you're using a disk that's larger than 2G,
and your root partition is going past the 2G boundary. Because of
limitations in the Openboot PROM, you can't boot any of the
32-bit SPARCs from a root partition that has tracks lying beyond
the 2G boundary. On systems with really old PROM revisions (2.5
or previous), you need to make the root partition smaller than 1G.
The PROMs in the newer 64-bit Ultra class machines are capable
of having root partitions that go beyond the 2G boundary, but versions
of Solaris prior to 2.6 contain a bug that effectively prevents
it. Patch 103640-08 (or a later revision) corrects this for Solaris
2.5.1.
Other typical error messages you'll see when going beyond
the 1/2G boundary:
bootblk: can't find the boot program
boot: cannot find misc/krtld
Short read. 0x2000 chars read
Read error.
Amy Rich, president of the Boston-based Oceanwave Consulting, Inc.
(http://www.oceanwave.com), has been a UNIX systems administrator
for more than five years. She received a BSCS at Worcester Polytechnic
Institute, and can be reached at: qna@oceanwave.com.
|