Cover V06, I03
Article
Figure 1
Figure 2
Listing 1
Listing 2
Listing 3

mar97.tar


Big Brother: A Web-based UNIX System and Network Monitor

Sean MacGuire

One of the major responsibilities of any UNIX systems administrator is to know what is happening on the network at any given time. Once the number of machines on the network exceeds about seven, this process becomes dramatically more difficult. Determining the state of a system, before being told by an irate user that it's down, is an art form that only the paranoid and masochistic could embrace.

Having faced this situation too many times, I have written and rewritten innumerable little scripts to check on the health of a system. Reporting network information was always a problem. I would either send myself bits of mail (and hope I read them), or I would compromise security by automating direct connections between machines, generally using some form of remote shell. Nonetheless, I still spent a significant amount of time telnetting around the network, logging in and generally making sure that all was well, day and night.

So, I began looking into products to perform UNIX systems monitoring and notification. The systems I investigated had several serious drawbacks; they were quite expensive, some even requiring their own replicated servers to run. They seemed very complicated, requiring a considerable amount of time to install and configure. Most consumed vast amounts of systems resources, because they were so powerful and were operated from a central console. I just wanted to know how my machines were doing, without having to go into the office.

Fortunately, the past few years have seen an explosion in the Internet and the Web. One great function of HTML is its ability to create an attractive, portable, and remotely accessible graphical user interface (GUI) quickly and easily. So, I decided to build Big Brother, a Web-based UNIX systems and network monitor.

Objectives

I wanted to monitor the following areas:

  • Network connectivity

  • Disk usage

  • CPU usage and number of users

  • Ensure important processes are up and running

  • Web servers

  • Important messages in the system logs

    And, I wanted the program to be lightweight and efficient.

    I also wanted:

  • A color coded status display of all machines on the network

  • Access to the real data from the remote system

  • Notification by pager if something went wrong

  • The ability to access this information from anywhere

  • The ability to integrate information from other packages, like Legato Networker

    In short, I wanted a Big Brother to watch the entire network for me.

    Design

    The central problem was not so much what the scripts were to do, but how to transmit this information to a central location in a secure fashion. For that purpose, I wrote two simple programs, a client (bb) that sends single lines of data to a server (bbd) over TCP port 1984.

    Monitoring connectivity within the network proved straightforward. I used a Bourne shell script to ping every host I wanted monitored, and wrote bbnet to test any port on any server. If bbnet is given an HTTP address, it not only checks for the presence of a Web server, but also displays the output from the server. This network monitoring script is called bb-network.sh and is executed every 5 minutes.

    I decided that each machine on the network should have a small client that measured local information and sent it back to a central location, the display server. The local client script, named bb-local.sh, just measured disk usage (using df), CPU usage and number of users (using uptime), and whether certain important processes are running (with ps and grep) every 5 minutes. All this information is sent back to a central location, the display server, using bb.

    Finally, if any of these scripts notices trouble, they use bb to send a numeric message to the pager server. The pager server then calls the script bb-page, which is just a wrapper for kermit that dials the pager number and transmits the numeric message. Figure 2 shows how these parts interconnect.

    The Core of Big Brother: bb, bbd, and bbnet

    Because all the elements of the system use bb to send messages and bbd to receive them, it is important to understand the format of the messages they send. Again, the emphasis was on simplicity, so single lines of data are sent and received. The format of the bb command is as follows:

    bb [ip-address] "msg-type message"
    
    ip-address	 the IP address of the machine running the bbd server
    msg-type	either status or pager

    For status messages, the message portion is in the following format:

    hostname.area  color-code  date  explanation

    hostname the hostname where the report is from area the type of report (i.e., conn, cpu, disk, msgs, etc.) color-code green (OK), yellow (warning), or red (alert) explanation a descriptive message like "/usr is 65% full" For pager messages, the message portion is in the following format:

    pager-number  error-code  ip-address

    pager-number the pager number to call error-code a numeric error code from Big Brother indicating the problem: 100 - Disk Error. The disk is over 95% full. 200 - CPU Error. CPU load average is unacceptably high.

    300 - Process Error. An important process has died.

    400 - The message file contains a serious error.

    500 - Ping connectivity error, can't connect.

    600 - Web server HTTP error - server is down.

    911 - User Page. Message is phone number to call back.

    ip-address The ip-address of the affected system.

    When bbd receives these messages, it determines what action to take based on whether it is a status message, or a pager message. If it is a status message, it creates a log file in the log directory called hostname.area (i.e., coffee.disk) that gets processed by the display server. If it's a pager message, it calls the shell script, bb-page, to issue the numeric message using kermit.

    Included in Big Brother is bbnet, a generic server testing program. The idea is simple; open a port on a machine and see what comes back. Currently bbnet tests Web servers, but it could also be used to test ftp, telnet, or any other servers. It will display the first 256 characters of the response from the server.

    bbnet [URL | machine-name|ip-address:port]

    Finally, for completeness, touchtime tells what time it was precisely 30 minutes ago. Its output is in a format compatible with the touch command. It is used with touch to create a file to which to compare data files on the display server to ensure that all reports are current.

    The Web Display

    When running, Big Brother will create lots of little status files in the log directory. These files are named using the hostname of the reporting machine and the area being reported on. So, for my machine named "coffee", the following files would be created in the log directory:

    coffee.conn Network connectivity status

    coffee.cpu CPU status

    coffee.disk Disk space status

    coffee.procs Running processes status

    coffee.msgs System files status

    coffee.http HTTP status

    The contents of these files will all resemble the following disk report:

    green Mon Nov 18 14:04:53 EST 1996 /usr is 65% full

    Because the files are named by machine and area, and given that the first word of each of these files tells us its color-coded status, this information can be displayed on a Web page as a matrix of machines and areas being monitored. Figure 1 shows what the Big Brother web display looks like. Within the matrix, colored balls correspond to the status of any area at the time the report was issued. Clicking on the colored ball will display additional information. The scripts mkbb.sh and mkbb2.sh create the Big Brother Web pages.

    The only area that approaches cleverness in the Web display is checking that the reports in the log directory are fresh. If any report is over 30 minutes old, the corresponding dot in the display matrix is changed to purple. This means that reports from that host are not being received by bbd, usually because bb has stopped running on that host. As previously mentioned, touchtime creates a file exactly 30 minutes old, and all reports are measured against the timestamp on this file.

    Intuitive information is displayed by examining each of the log files in the directory to determine the most severe condition on the network at the time. The script mkbb.bkg generates the background color of the Big Brother Display page. In order of increasing severity, these background colors are:

    green All is well on the network

    yellow There is a warning somewhere on the network

    purple Host is not reporting in for some reason

    red Severe condition needs attention

    Thus, the network status is clearly visible from the background color of the screen. The downside of this simplicity is that absolutely anyone looking at a Big Brother display knows the status of the network. Everybody becomes an expert, and they'll want to know why that screen is red.

    Downloading the Big Brother Source Code

    The source code, demos, and additional information are available on the Big Brother home page: http://www.iti.qc.ca/iti/users/sean/bb-dnld/ and via ftp from the Miller Freeman site: ftp://ftp.mfi.com/pub/sysadmin/.

    To install the archives, decide where you want Big Brother to live. The archive will create a new directory, bb, that will house the system. This directory, henceforth shall be forever known by the environment variable BBHOME. To extract Big Brother, issue the following commands:

    gzip -d bb-src.tgz
    tar xvf bb-src.tar

    The Big Brother system is structured as follows:

    doc/ Documentation and configuration scripts

    etc/ Where all the configuration files live

    src/ bbd.c, bb.c, bbnet.c, and touchtime.c programs

    bin/ Where the ported binaries and shell scripts live

    web/ Scripts that create the Big Brother web pages

    www/ The directory that should be linked into your Web site

    www/logs The directory where bbd writes status files

    www/notes A place to put information about monitored systems

    Environment variables important to Big Brother are:

    BBHOME The top level directory where Big Brother is installed

    BBDISPLAY The machine with the Web Server, a.k.a. the Display Server

    BBNET The machine that will run the network monitor

    BBPAGER The machine that will process pager requests; needs kermit and a modem.

    And finally, the whole package is completely dependent on the information you place in the file etc/bb-hosts.

    Configuring Big Brother

    Automatic configuration is supported for SCO, FreeBSD, Solaris, Linux, HPUX 10, and SunOS 4.1, NetBSD, OSF, Ultrix, and Irix. To run the automatic configuration program from the top level Big Brother directory, enter:

    cd doc

    ./bbconfig [OS-NAME]

    where OS-NAME is one of: sco, freebsd, solaris, hpux, linux, sunos, netbsd, osf, ultrix, or irix. This program just adjusts src/Makefile and copies the appropriate system definitions from etc/bbsys.OS-NAME to etc/bbsys.local.

    For security purposes, I isolated all commands used by Big Brother, and placed the full pathname of each command into its own environment variable. This file is named etc/bbsys.sh. These defaults can be overridden by redefining these variables in the file etc/bbsys.local. Listing 1 shows a sample of this file for FreeBSD.

    If you are not running one of the automatically configured operating systems, you will have to edit the Makefile, create your own version of the bbsys.local file and define where these commands live on your system.

    Big Brother is highly configurable. You can tell Big Brother what constitutes a warning and what constitutes an urgent situation. For each area, you can also specify whether or not you want to be paged. The defaults are quite sensible, however you can adjust the pre-set values by editing the file etc/bbdef.sh; you will also have to tell Big Brother where to find kermit and the pager number to call. The standard version of etc/bbdef.sh is shown in Listing 2.

    If you are able to autoconfigure, or once you've edited the Makefile, to compile the binaries all you have to do is issue the following commands:

    cd ../src
    make

    If there are no problems, you can install the binaries:

    make install

    Next, edit the file runbb.sh. It needs to have the environment variable BBHOME set to the directory where Big Brother lives:

    BBHOME="/home/sean/bb"

    The entire system rests on the information you place in the etc/bb-hosts file. This file is really similar to your /etc/hosts file (in fact it used to be the same), but with additional information in what were the comment fields. Refer to Listing 3 for an example of a very simple bb-hosts file.

    Big Brother consists of a network monitor script, which will run on the machine defined in the etc/bb-hosts file as BBNET. This script will check every host listed in the etc/bb-hosts file for connectivity via ping, and will also check for a Web server should the line contain a URL using bbnet.

    Pager alerts are sent to the machine defined as BBPAGER in the etc/bb-hosts file, which needs kermit and a modem installed to work correctly.

    Finally, the machine you define as BBDISPLAY in the etc/bb-hosts file is the Web server. All status reports are sent here, and bbd creates the files in the www/logs directory. This information becomes the basis for the Big Brother pages bb.html and bb2.html that live in the www/ directory.

    The keywords required for configuration are:

    BBDISPLAY machine for Web display

    BBPAGER your pager server

    BBNET network monitor machine

    http:// check this URL on this box

    Note that you can use one machine for all of the above servers or any combination of the above. It may seem a little confusing, but it makes things very flexible.

    Running Big Brother

    Now you're ready to test Big Brother for the first time. It helps to run the test on the BBDISPLAY machine, since you can check the results fairly easily by looking in the www/logs directory.

    Go to the directory you defined as BBHOME and issue the following command:

    ./runbb.sh &

    If there are files appearing in the www/logs directory, then things are looking good. The corresponding pages www/bb.html and www/bb2.html should also have been created, and all these files should get re-created every 5 minutes.

    If all of the above seems to be functioning, then it's time to view the pages. Since most Web servers isolate their data under DocumentRoot or the like, the easiest way to get these pages on-line is to choose where in your Web to install it, and create a symbolic link to the BBHOME/www directory. For example, if your Web Document Root directory is /usr/www/docs, and BBHOME is set to /usr/acct/sean/bb then issue the following command to make the pages accessible:

    cd /usr/www/docs
    ln -s /usr/acct/sean/bb/www bb

    You should then be able to access the pages via your favorite browser using the URL:

    http://your-machine-name-here/bb/bb.html

    Password protecting this area is also highly recommended, just on principle.

    Installing the Clients

    The next thing to do is to put Big Brother on all the clients you want to monitor. You can simply replicate your Big Brother directory on the different clients and execute runbb.sh. However, if you're running in a heterogeneous environment, you'll have to port Big Brother again. Assuming the information in the etc/bb-hosts file is correct, you should see information about the clients you are monitoring begin to come into the www/logs directory on display server. You don't really need everything, it's just easier to install it this way.

    The alternative for those running in a homogeneous environment is to do the following:

    cd BBHOME/docs
    bbclient [client-hostname]

    Assuming you've configured your etc/bb-hosts file correctly, this script will create a tar archive called bb-client-hostname.tar that you can install on the remote machine. Note that this archive is created above the BBHOME directory. Bring it across to the client, and execute runbb.sh as described earlier. Once you're really confident with it, have runbb.sh executed at system startup.

    Debugging Big Brother

    By far the most common problem is that the etc/bb-hosts file is incorrect. Machine names are case sensitive, and the BBDISPLAY and BBNET variables must be defined. Next, if no status reports are being created in the www/logs directory, you can test bb manually by setting BBHOME in your environment, then issuing the following commands:

    ./bbd
    ./bb ip-address "status test.test hello world"

    This should create a file called test.test in the www/logs directory, containing the text, "hello world."

    If client data does not appear to be coming in, remember that runbb.sh must be running on each client that you want to completely monitor. The bb-local.sh script collects the local client data and uses bb to transmit it to bbd on the server you've defined as BBDISPLAY in etc/bb-hosts, in the www/logs directory.

    Syntax errors and complaints about incorrect formats for commands can usually be isolated to etc/bbsys.local, where all commands are defined. If you're not using a system that bb has been ported to, some adjustments to this file may be required.

    Conclusion

    Big Brother is a useful example of using a small client-server routine combined with HTML and a Web server to create a GUI front-end for otherwise ordinary shell scripts. It's not perfect, but it does demonstrate the flexibility of the Web to share and disseminate vital and useful information.

    Big Brother is not a replacement for a qualified systems administrator, but it is an excellent assistant. It's easy to set up and can be integrated with any tool that can execute a UNIX command. I can now go for coffee in peace, secure in the knowledge that if something goes awry, I will be paged with an error code and the IP address of the machine involved. Even if something goes wrong in the middle of the night, I can call in, check the Web page, and get concise information about the network.

    Since its creation, many people have helped in the porting and reporting of problems, too many to list here. I've received lots of email from happy users and have seen Big Brother propagate from Canada through the United States, Europe, Australia, Russia, Tahiti, and Namibia. Your feedback has been essential, exciting, and greatly appreciated.

    Big Brother has saved me lots of time and effort, and I hope it will do the same for any administrator who installs it. It's the only situation where I'm comfortable knowing that Big Brother is watching.

    About the Author

    Sean MacGuire is a consultant who has spent almost 15 years in the company of UNIX systems. He has a couple of patents pending and is publisher of the literary e-zine "It's a Bunny." His email address is sean@iti.qc.ca.


     



  •