A Simple System and Network Monitor
James Meritt
As Unix-based systems became more commonly used throughout the other-than-research community (about a decade ago), it was not unusual for a single server to have its own dedicated system administrator - or at least very few systems to one technically capable individual. This person spent a lot of time checking things like who was logged in, were there any hardware or system errors recently, is the network up, has anyone broken in (or "unauthorized use"), are any Trojan executables in place on the system, what is the weather like out the window...
There are a lot more UNIX-based systems out there now, but the number of qualified system administrators has not nearly kept up with the demand. Instead of one or two systems to watch, an administrator may have dozens, hundreds, or even thousands of systems. Although such things as NIS, automounters, and distribution software packages make things easier, there is still a need to check all of the systems, analyze logs, look for Trojans, and more.
As a "victim" of this expansion, I was faced with the impossibility of looking at all the information from a large number of systems - which was proving to be much too time consuming, even with the help of NIS and automounters.
To help me cope with this problem, I wrote a shell script, which is started nightly by cron, that collects information from many systems across the network and mails a comprehensive (though brief) summary to an identified sys admin. I have written a few other similar programs to perform additional intrusion detections, monitor performance, and so on but this is an easily modified "core" program (Listing 1). This script is C shell; however, for use on networks that are not as secure as the one I was using, I recommend the Bourne shell. There would only be a few modifications (the foreach is a csh thing; use the do for sh).
My solution is based upon everyone running rstatd. I've written other similar programs that exploit NIS (NIS+ in my case) and automounters, but they are not needed for this one. I wrote this script to be as general purpose as possible, and self-adapting to the extent that it could be used with minimal rewrite on other networks. I recommend running it from cron after 11 at night. That way the load (and it does cause a bit of load across the systems) is less likely to cause problems, and it can examine the entire day's worth of logs.
First, specify command locations to fit the local configuration and sys admin preferences. Second, create the names for temporary files using the system time, to make it unlikely that something else will conflict. Then, identify the directory that this program, as well as its associated files, will reside in. Finally, identify the preferred mailer and the email address to mail the report to. This is the only section that will require editing by the sys admin. The rest uses these variables so it would not require additional revising. This section is at the very beginning, so it is easy to find.
Next, the list of nodes responding to rup is built up. This is a dynamic process to keep the script flexible enough for use on networks on which the hosts keep changing to minimize the admin problems associated with yet another script. A remote host will only respond if it is running the rstatd, which is normally started up from inetd(1M). This enables the program to avoid wasting time attempting to examine unreachable systems and does identify those systems that someone just "put on" the network without notifying the sys admin. There are other ways to get the hosts list - the easiest was just reading them from a file, but I thought that would reduce the flexibility of the program. In other similar scripts, the "rup list" is compared to a file to identify "down" or "added" systems. This only takes a few lines.
For every system identified, get a list of who logged into that system today. This will enable you to look at a single report of the logins across the network. Watch for anything unusual, such as logins at times or places that they do not usually occur. System boots will also show up here.
Examine every system message file, generating a chronological list from each file on items identified as intrusion indicators, communications/network problems, and hardware difficulties with individual systems. This list is from every message file on every system. The comprehensive list is time-sorted to help the sys admin scan for patterns. For example, a series of NFS errors at a number of hosts at about the same time may indicate a problem with the server, not the machine having the problem. If the errors are scattered in time, it may not be the server but the client or the network.
I looked at some other products, but they assumed that there was already some central logging taking place and didn't preset what to look for. They were good for examining logs, but you had to get the logs and tell them what to look for!
In my script, the intrusion indicators contain the key phrases: LOGIN FAILURES, attempt, sendmail, connection from bad port, su:, SECURITY ALERT, failed. The network problems contain the key phrases: No carrier, not responding, NIS, nis, NFS, nfs, cannot talk, IR:, looking for, can't open port, named. The system errors scanned for are: WARNING, going down, panic, syncing, dump, full, hardware, Hardware. These key phrases are contained in files crack.error, network.error, and system.error (Listings 2-4).
The entire messages line containing one of these keys is extracted from all the systems, so that where and when are easily read. This deletes possibly millions of "who cases" lines that could obscure vital information.
To check for Trojan executables, I put a simple find on the main paths looking for any executable modified in the past day. This is also a good way to double check for any software installations, since find doesn't discriminate between "authorized" and "bogus." The paths searched are: /bin, /etc, /usr/ucb, /usr/local, /usr/bin, /usr/sbin, and /usr/ucb, but it would be very easy to add or delete from this list.
You should do a system find for any core files and erase them. This requires that the program be run with sufficient permission to actually delete the files. If you do not want to allow root rsh, simply drop this part. Nothing else requires any special privileges.
Finally, mail the report to the address given initially and clean up by deleting the temporary "scratch" files. When you come in the next morning, you can spend a relatively brief time going through the report that was mailed to you and get a fairly good picture of how everything behaved the day before instead of spending hours logging across the network to the various systems.
About the Author
Jim Meritt is currently working for Wang Government Systems, Inc as a Staff Systems Consultant, and has been involved with Unix systems and networking for more than a decade. His email address is: JWMeritt@AOL.com.
|