Cover V10, I09

Article

sep2001.tar


Can Sun Management Center Centralize Your Administration Tasks?

Peter Baer Galvin

Content Level: Intermediate

Content Audience: Solaris Administrators and Managers

Sun Management Center has been gestating for several years. Parts of it came from other products, such as Symon. Early versions were performance and feature poor. Sun has a major push underway to make Sun Management Center 3.0 a core component of running Sun/Solaris environments. Currently, SMC is not used by many Sun sites. Should it be?

Overview

SMC 3.0 is designed to improve service levels and decrease administration cost. It is based on a three-tier model, providing an architecture to allow it to scale (Sun claims to thousands of nodes). It provides monitoring and management services for Sun and Solaris systems. Some core system management tasks are being managed by SMC (including DR and AP configuration, and many SunFire features). This means that, unless yours is a small Sun site, SMC will be in your future, sooner or later.

The basic tool is free and available for download. Sun has layered additional (fee-based) packages on the platform, including "Advanced Systems Monitoring", "Premier Management Applications", and "System Reliability Manager". According to Sun, SMC can improve customer's ability to do predictive fault analysis, perform remote management, and improve system availability.

SMC 3.0 is centered on a Java GUI that provides an interface to all components. The console can run on Sun and Windows platforms, providing remote administration. The console can run as a native application on a management station, and the use of Web technologies also makes management information available from Web browsers. SMC now also has a command-line interface. Multiple consoles can be used simultaneously, with the "server layer" managing communications between the agents and the consoles. The "intelligent" agents load or unload modules as required. Sun also claims that polling between consoles and agents is minimized to prevent SMC from inducing the performance problems it is designed to find. Sun recommends an Ultra 60 as a decent console for up to 1000 agents.

To further ease use and enable scalability, object grouping allows tasks to be simultaneously defined on multiple machines or services. Filtering features can help sort through all the events and objects. Tasks can also be scheduled and automated. The platform also provides for triggering of events and alarm generation and management. There is a knowledge base that includes information about common events and their solutions, and it is extensible.

There is a development environment (SDK and GUI) to allow a site to create new modules or modify existing ones. SMC can work as a standalone service, but it can also include integration modules to allow it to be a citizen in a Unicenter TNG, Enlighten, Halcyon, Openview, or Tivoli environment.

For systems administrators and managers who are concerned about security (do any who are not security-conscience still have jobs?), SMC boasts end-to-end security and access control to limit the tasks available to a given user. (It uses SNMP 2 USEC for its security model.)

SMC is certainly the center for Sun's system management efforts going forward. In fact, Sun just announced the new "system reliability manager" component, which includes patch management, the file watch service to alert of missing files and to track file changes, the script launcher tool that helps manage script execution, and an OS crash dump analyzer to monitor the number of crash dumps and allow authorized users to perform crash-dump analysis. This module is an add-on package with add-on pricing. The Sun model is to give away the base unit (for unlimited nodes), and to charge for advanced features that can be added to the base. There are also add-in modules for SunFire and Netra systems.

The SMC Web site (http://www.sun.com/solaris/sunmanagementcenter/) has many supporting features for the product. In fact, there is a guided tour (a Flash presentation) that presents a useful overview of the features and functions of SMC 3.0.

The Test

But how does the product fare? The published information is certainly impressive. But those of us who have used Sun products (or indeed any software products) over a period of many years know to read product brochures with a jaded eye, and not to assume that just because a product has a feature, that the feature is actually useful.

So I loaded up SMC 3.0 on my Ultra 10 workstation and put it through it paces. I downloaded the tarball, the two required patches, the reliability manager, and the Windows console from the Sun Web site. Do not attempt this over a 56-Kb modem! Of course, these packages are also available on CD-ROM from Sun.

The Important Details

There are many important details that were discovered through use of SMC. These include:

  • SMC includes its own Web server, which must be used for Web access, even on machines that already have a Web server. It also installs its own copy of Java (in a separate directory).
  • SMC 2.1 and 3.0 are very different, and do not get along very well. Agents from 2.1 can be used with 3.0, but there is little other interoperability.
  • A "terminal package" is included with SMC to allow the simultaneous installation of the agents on many machines. (I did not test this functionality).
  • SMC 3.0 makes extensive use of SMNP for communications and alarming. In fact, eight ports are used in communications between the server and its agents. It also uses RMI (port 2099 and random >2000 ports). This should raise the eyebrows of those who are security-conscious.

Guided Tour

The download was straightforward. Patches were required, and clicking on the patch link led to a set of necessary patches. However, clicking on the two individual patches necessary for Solaris 8 led to a patch search screen, rather than directly to those patches. Typing in the patch numbers led to the appropriate patches, which were downloaded and installed.

Installation was cumbersome. The installation program was buried in sunmc_3.0_build41_122000/disk1/sbin/es-inst. After lots of disk rattling, the install program ran "setup", which modified /etc/system and then required a reboot, informing me to "then run the setup again". Unfortunately, it does not tell where the setup program is to be found. Checking the setup log file (/var/opt/SUNWsymon/install/setup*) revealed that /opt/SUNWsymon/bin/es-setup was the program in question. A reboot followed by execution of that program resulted in a system with all of the tools installed.

After a few adjustments (creating an SMC user account, setting the password, reading through some manuals) it was time to launch the beast. By this time, I was thinking of it as a beast -- SMC is big, with lots of disk space used, plenty of daemons running (including a database), complex, and quite a bit of time was required to install it.

es-start -c started the console on the local machine. The console is quite manageable, with the left window showing hierarchy and the right window shows current information and activities. The information available here is interesting and useful, including a list of systems being monitored, with drill-down on various aspects of each system. One oddity was that the disk information screen did not show capacity or amount used.

Within the SMC information for any given machine are a lot of details about the hardware configuration, including memory bank use, system speeds, CPU details, and networking interface information. There is also a physical view of almost all aspects of the system, which still has a high "wow" factor. The software information is a bit limited, giving OS release information and not much more.

After a satisfactory investigation of the features of the basic SMC, it was time to try the System Reliability Manager. After downloading the package, the command es-inst, when pointed to the add-in directory, performed the installation. Unfortunately, the setup program failed to stop the database (it seems to use Oracle as the information repository database). A manual run of es-start to attempt to start the SMC service failed with similar database errors. Even a reboot did not solve the problem. Apparently, the add-ins need a little more work.

SMC 3.0 Conclusions

SMC 3.0 is Sun's best attempt yet at centralized system monitoring and management. It is certainly not useful for monitoring an individual workstation, and its use is questionable even for several small servers. But for sites with several large servers, or SunFires, it is certainly worth investing the time and effort (and possibly money) to install, configure, learn, and live with the SMC package. SMC has a proclivity for disk space and system resources, but in exchange it provides system monitoring, event management, and a scalable facility. One further recommendation is that a dedicated workstation be used as the console for SMC 3.0.

Those Smart Readers

Tony Frank wrote in with some interesting follow-up questions to June's Column. The note and my response are included here.

From: Tony Frank
To: pbg@petergalvin.org
Subject: Solaris disk mirroring (Sysadmin mag June 2001)
Hi,

I've just been reading your article about disk mirroring with both DiskSuite and Veritas.

This is quite similar to what we have been doing for some time now.

One point that I noticed in your article that we have not done is the following:

Add a device alias to the Open Boot Prompt NVRAM. The system might already have the appropriate disk aliases already set up (i.e., "disk" and "disk1"). However, it is beneficial to add the following as devaliases because they are more intuitive and this leaves the original devalias commands intact, should you ever need to go back to them:

{0} ok nvedit
   0: devalias rootmirror0 /pci@1f,0/pci@1,1/scsi@2/disk@0,0
   1: devalias rootmirror1 /pci@1f,0/pci@1,1/scsi@2/disk@1,0
{0} ok nvstore
{0} ok setenv use-nvramrc? true
{0} ok setenv boot-device rootmirror0 rootmirror1
{0} ok setenv diag-device rootmirror0 rootmirror1 
Reboot each devalias to make sure everything comes up.

This certainly seems like an important step, and I assume this is to ensure the system starts even in the case where the "0" disk has failed.

One question I have on this procedure is the final step "reboot each devalias" -- how do I do this?

The simple test would be to reboot and while the system is shutdown, physically remove the first disk, but I get the impression that there may be something else here?

Any tips you can provide here would be great!

Regards,
Tony

Tony, thanks for the question. Yes, the system will try to boot from each device in "boot-device" until it finds one with a boot block. It won't always help, if for instance the first disk is corrupt but not missing. But it would help if the first disk disappeared -- the second entry would then be used.

Simply halt the machine and do "boot rootmirror0", and after testing that one, halt and "boot rootmirror1" to make sure that each alias is correct.

Your suggestion would be a good further test -- remove the 1st boot device, and try power cycling or just "boot" and make sure the system discovers the second boot device and uses it.

In another note, Frank Im points out that he uses dd to copy an entire disk, partition information and all, from one disk to another, as in dd if=/dev/rdsk/c0t0d0s2 of=/dev/rdsk/c0t1d0s2 count=512. This could significantly shorten my script. However, "in the old days", dd was considered unreliable to make a full copy. I believe it had problems of ignoring bad sector markings and copying the bad sectors. I'd be interested in hearing from readers who use dd to copy entire disks. Any problems, or does it work correctly these days?

Finally, Kyle Niedzwiecki found a flaw in the "manual disk mirroring" script published in the July Column. The line

sed 's/${SRC}/${DEST}/g' /etc/vfstab > ${MOUNTDIR}/etc/vfstab;
should have double quotes, rather than single quotes, as in:
sed "s/${SRC}/${DEST}/g" /etc/vfstab > ${MOUNTDIR}/etc/vfstab;

The problem was corrected in the July column's contents that is currently online. Sorry for any inconvenience.

Next month in the Solaris Corner, there should be a new Resources section listing (frequently updated) locations that provide the best information available in a variety of Solaris-related areas.

Peter Baer Galvin (http://www.petergalvin.org) is the Chief Technologist for Corporate Technologies, a premier systems integrator and VAR. Before that, Peter was the systems manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines, and previously wrote Pete's Wicked World, the security column, and Pete's Super Systems, the systems management column for Unix Insider (http://www.unixinsider.com). Peter is coauthor of the Operating Systems Concepts and Applied Operating Systems Concepts textbooks. As a consultant and trainer, Peter has taught tutorials and given talks on security and systems administration worldwide.