Can
Sun Management Center Centralize Your Administration Tasks?
Peter Baer Galvin
Content Level: Intermediate
Content Audience: Solaris Administrators and Managers
Sun Management Center has been gestating for several years. Parts of it came
from other products, such as Symon. Early versions were performance and feature
poor. Sun has a major push underway to make Sun Management Center 3.0 a core
component of running Sun/Solaris environments. Currently, SMC is not used by
many Sun sites. Should it be?
Overview
SMC 3.0 is designed to improve service levels and decrease administration
cost. It is based on a three-tier model, providing an architecture to allow
it to scale (Sun claims to thousands of nodes). It provides monitoring and management
services for Sun and Solaris systems. Some core system management tasks are
being managed by SMC (including DR and AP configuration, and many SunFire features).
This means that, unless yours is a small Sun site, SMC will be in your future,
sooner or later.
The basic tool is free and available for download. Sun has layered additional
(fee-based) packages on the platform, including "Advanced Systems Monitoring",
"Premier Management Applications", and "System Reliability Manager". According
to Sun, SMC can improve customer's ability to do predictive fault analysis,
perform remote management, and improve system availability.
SMC 3.0 is centered on a Java GUI that provides an interface to all components.
The console can run on Sun and Windows platforms, providing remote administration.
The console can run as a native application on a management station, and the
use of Web technologies also makes management information available from Web
browsers. SMC now also has a command-line interface. Multiple consoles can be
used simultaneously, with the "server layer" managing communications between
the agents and the consoles. The "intelligent" agents load or unload modules
as required. Sun also claims that polling between consoles and agents is minimized
to prevent SMC from inducing the performance problems it is designed to find.
Sun recommends an Ultra 60 as a decent console for up to 1000 agents.
To further ease use and enable scalability, object grouping allows tasks to
be simultaneously defined on multiple machines or services. Filtering features
can help sort through all the events and objects. Tasks can also be scheduled
and automated. The platform also provides for triggering of events and alarm
generation and management. There is a knowledge base that includes information
about common events and their solutions, and it is extensible.
There is a development environment (SDK and GUI) to allow a site to create
new modules or modify existing ones. SMC can work as a standalone service, but
it can also include integration modules to allow it to be a citizen in a Unicenter
TNG, Enlighten, Halcyon, Openview, or Tivoli environment.
For systems administrators and managers who are concerned about security (do
any who are not security-conscience still have jobs?), SMC boasts end-to-end
security and access control to limit the tasks available to a given user. (It
uses SNMP 2 USEC for its security model.)
SMC is certainly the center for Sun's system management efforts going forward.
In fact, Sun just announced the new "system reliability manager" component,
which includes patch management, the file watch service to alert of missing
files and to track file changes, the script launcher tool that helps manage
script execution, and an OS crash dump analyzer to monitor the number of crash
dumps and allow authorized users to perform crash-dump analysis. This module
is an add-on package with add-on pricing. The Sun model is to give away the
base unit (for unlimited nodes), and to charge for advanced features that can
be added to the base. There are also add-in modules for SunFire and Netra systems.
The SMC Web site (http://www.sun.com/solaris/sunmanagementcenter/) has many supporting features for the product. In fact, there is
a guided tour (a Flash presentation) that presents a useful overview of the
features and functions of SMC 3.0.
The Test
But how does the product fare? The published information is certainly impressive.
But those of us who have used Sun products (or indeed any software products)
over a period of many years know to read product brochures with a jaded eye,
and not to assume that just because a product has a feature, that the feature
is actually useful.
So I loaded up SMC 3.0 on my Ultra 10 workstation and put it through it paces.
I downloaded the tarball, the two required patches, the reliability manager,
and the Windows console from the Sun Web site. Do not attempt this over a 56-Kb
modem! Of course, these packages are also available on CD-ROM from Sun.
The Important Details
There are many important details that were discovered through use of SMC.
These include:
- SMC includes its own Web server, which must be used for Web access, even
on machines that already have a Web server. It also installs its own copy
of Java (in a separate directory).
- SMC 2.1 and 3.0 are very different, and do not get along very well. Agents
from 2.1 can be used with 3.0, but there is little other interoperability.
- A "terminal package" is included with SMC to allow the simultaneous installation
of the agents on many machines. (I did not test this functionality).
- SMC 3.0 makes extensive use of SMNP for communications and alarming. In
fact, eight ports are used in communications between the server and its agents.
It also uses RMI (port 2099 and random >2000 ports). This should raise the
eyebrows of those who are security-conscious.
Guided Tour
The download was straightforward. Patches were required, and clicking on the
patch link led to a set of necessary patches. However, clicking on the two individual
patches necessary for Solaris 8 led to a patch search screen, rather than directly
to those patches. Typing in the patch numbers led to the appropriate patches,
which were downloaded and installed.
Installation was cumbersome. The installation program was buried in sunmc_3.0_build41_122000/disk1/sbin/es-inst.
After lots of disk rattling, the install program ran "setup", which modified
/etc/system and then required a reboot, informing me to "then run
the setup again". Unfortunately, it does not tell where the setup program is
to be found. Checking the setup log file (/var/opt/SUNWsymon/install/setup*)
revealed that /opt/SUNWsymon/bin/es-setup was the program
in question. A reboot followed by execution of that program resulted in a system
with all of the tools installed.
After a few adjustments (creating an SMC user account, setting the password,
reading through some manuals) it was time to launch the beast. By this time,
I was thinking of it as a beast -- SMC is big, with lots of disk space used,
plenty of daemons running (including a database), complex, and quite a bit
of time was required to install it.
es-start -c started the console on the local machine. The console
is quite manageable, with the left window showing hierarchy and the right window
shows current information and activities. The information available here is
interesting and useful, including a list of systems being monitored, with drill-down
on various aspects of each system. One oddity was that the disk information
screen did not show capacity or amount used.
Within the SMC information for any given machine are a lot of details about
the hardware configuration, including memory bank use, system speeds, CPU details,
and networking interface information. There is also a physical view of almost
all aspects of the system, which still has a high "wow" factor. The software
information is a bit limited, giving OS release information and not much more.
After a satisfactory investigation of the features of the basic SMC, it was
time to try the System Reliability Manager. After downloading the package, the
command es-inst, when pointed to the add-in directory, performed
the installation. Unfortunately, the setup program failed to stop the database
(it seems to use Oracle as the information repository database). A manual run
of es-start to attempt to start the SMC service failed with similar
database errors. Even a reboot did not solve the problem. Apparently, the add-ins
need a little more work.
SMC 3.0 Conclusions
SMC 3.0 is Sun's best attempt yet at centralized system monitoring and management.
It is certainly not useful for monitoring an individual workstation, and its
use is questionable even for several small servers. But for sites with several
large servers, or SunFires, it is certainly worth investing the time and effort
(and possibly money) to install, configure, learn, and live with the SMC package.
SMC has a proclivity for disk space and system resources, but in exchange it
provides system monitoring, event management, and a scalable facility. One
further recommendation is that a dedicated workstation be used as the console
for SMC 3.0.
Those Smart Readers
Tony Frank wrote in with some interesting follow-up questions to June's Column.
The note and my response are included here.
From: Tony Frank
To: pbg@petergalvin.org
Subject: Solaris disk mirroring (Sysadmin mag June 2001)
Hi,
I've just been reading your article about disk mirroring with both
DiskSuite and Veritas.
This is quite similar to what we have been doing for some time now.
One point that I noticed in your article that we have not done is the
following:
Add a device alias to the Open Boot Prompt NVRAM. The system might
already have the appropriate disk aliases already set up
(i.e., "disk" and "disk1"). However, it is beneficial to add the
following as devaliases because they are more intuitive and this
leaves the original devalias commands intact, should you ever need
to go back to them:
{0} ok nvedit
0: devalias rootmirror0 /pci@1f,0/pci@1,1/scsi@2/disk@0,0
1: devalias rootmirror1 /pci@1f,0/pci@1,1/scsi@2/disk@1,0
{0} ok nvstore
{0} ok setenv use-nvramrc? true
{0} ok setenv boot-device rootmirror0 rootmirror1
{0} ok setenv diag-device rootmirror0 rootmirror1
Reboot each devalias to make sure everything comes up.
This certainly seems like an important step, and I assume this is to
ensure the system starts even in the case where the "0" disk has
failed.
One question I have on this procedure is the final step "reboot each
devalias" -- how do I do this?
The simple test would be to reboot and while the system is shutdown,
physically remove the first disk, but I get the impression that there
may be something else here?
Any tips you can provide here would be great!
Regards,
Tony
Tony, thanks for the question. Yes, the system will try to boot from each
device in "boot-device" until it finds one with a boot block. It won't always
help, if for instance the first disk is corrupt but not missing. But it would
help if the first disk disappeared -- the second entry would then be used.
Simply halt the machine and do "boot rootmirror0", and after testing that
one, halt and "boot rootmirror1" to make sure that each alias is correct.
Your suggestion would be a good further test -- remove the 1st boot device,
and try power cycling or just "boot" and make sure the system discovers the
second boot device and uses it.
In another note, Frank Im points out that he uses dd to copy
an entire disk, partition information and all, from one disk to another, as
in dd if=/dev/rdsk/c0t0d0s2 of=/dev/rdsk/c0t1d0s2 count=512. This
could significantly shorten my script. However, "in the old days", dd
was considered unreliable to make a full copy. I believe it had problems of
ignoring bad sector markings and copying the bad sectors. I'd be interested
in hearing from readers who use dd to copy entire disks. Any problems,
or does it work correctly these days?
Finally, Kyle Niedzwiecki found a flaw in the "manual disk mirroring" script
published in the
July Column. The line
sed 's/${SRC}/${DEST}/g' /etc/vfstab > ${MOUNTDIR}/etc/vfstab;
should have double quotes, rather than single quotes, as in:
sed "s/${SRC}/${DEST}/g" /etc/vfstab > ${MOUNTDIR}/etc/vfstab;
The problem was corrected in the July column's contents that is currently
online. Sorry for any inconvenience.
Next month in the Solaris Corner, there should be a new Resources section listing
(frequently updated) locations that provide the best information available in
a variety of Solaris-related areas.
Peter Baer Galvin (http://www.petergalvin.org)
is the Chief Technologist for Corporate
Technologies, a premier systems integrator and VAR. Before that, Peter was
the systems manager for Brown University's Computer Science Department. He has
written articles for Byte and other magazines, and previously wrote Pete's Wicked
World, the security column, and Pete's Super Systems, the systems management
column for Unix Insider (http://www.unixinsider.com).
Peter is coauthor of the Operating Systems Concepts and Applied
Operating Systems Concepts textbooks. As a consultant and trainer, Peter
has taught tutorials and given talks on security and systems administration
worldwide. |