Cover V07, I05
Article
Figure 1
Figure 2
Listing 1
Listing 2
Listing 3
Listing 4
Listing 5

may98.tar


Graphing Usage Statistics on the Web

Andrew Mickish

As a system administrator in an environment with limited resources, I am required to monitor performance on machines that gradually become saturated with work and must be upgraded or redistributed. To monitor load and detect usage trends that might overwhelm a server, I implemented a system to log, parse, and plot usage statistics in graphs that are updated hourly on the Web. These graphs are especially useful in persuading management of the legitimate need for upgrades.

This article describes techniques for logfile analysis and datapoint plotting that should facilitate graphing any routine, logged activity. Two different approaches to acquiring usage data are illustrated by measuring DNS requests with the standard named daemon, and measuring network congestion with a customized script. The suite of scripts listed here will generate GIF files like those in Figures 1 and 2.

Collecting Daemon Usage Logs

Typical UNIX daemons like sendmail, named, ftp, and httpd can be configured to log their activities. Turning on logging may involve changing a configuration file, adding a parameter to the daemon invocation, or possibly recompiling the daemon with a flag set. Once enabled, the daemons make calls to the special syslogd daemon each time they have something to enter in the log. syslogd itself has a configuration file that controls which messages go to which file.

The DNS daemon named is capable of logging virtually every detail of its operation. It is also one of the more difficult daemons for which to activate logging, since it involves recompiling. If your DNS server does not already write out XSTATS entries to a system log, then you probably need to recompile named with the "extended statistics" flag enabled. Look in the BIND source code (The official home of the Berkley Internet Name Daemon is http://www.isc.org/bind.html.) for the file conf/options.h where you can uncomment the XSTATS flag. After recompiling named with this option, it will invoke syslog() hourly to report a summary of all activity for the past hour.

syslogd must also be configured to direct the informational messages from the named process to an appropriate log file. Adding something like the following to /etc/syslog.conf, and sending the syslogd process a hangup signal, will direct informational messages from all daemons to the designated file:

daemon.info          /var/stats/named.messages

The hourly summaries from named include the number of name resolution requests submitted to the DNS, the number of queries that had to be forwarded to a higher DNS authority, the number of lame delegations from other DNS servers, etc. For counting DNS hits, the relevant field is the "RQ" field - the number of requests that have come in since the named process was restarted.

Parsing Logs into Plottable Data

Once named has written a few entries in its log, plottable X,Y data for each hour's requests must be extracted into an auxiliary table suitable for plotting with GnuPlot. The script named_log_to_stats.pl in Listing 1 performs this analysis specifically for the named log.

This parser uses Perl's split() function to parse the full log entry into a complete set of variables, and uses only the interesting ones. Note that the parser's calculations are based on the assumption that log entries contain cumulative values since a fixed time (such as the start of the log), rather than values relative to each interval. The script can be easily customized to parse other log files that follow this statistical format.

Customized Usage Logs and Data

Collecting data for your own interesting events is even easier, because you can bypass the idiosyncrasies of things like named, and write data directly into a parsable format. It is a good idea, however, to collect statistics in a consistent manner for all processes you monitor, so that you can use the same analysis and plotting tools for all of them. For example, always keep a cumulative count of events since a fixed date, instead of a relative count since the last hour.

The script netstats.pl in Listing 2 uses the standard UNIX command netstat to record the number of outbound packets and the number of packet collisions that have occurred since the host was rebooted. This allows the parsing script to compute a ratio describing congestion on the host's network. Since the absolute time is included in each entry, this script can be run at any desired frequency from a cron job to obtain the maximum desired resolution of data points.

The parser script for the netstat output is netstats_log_to_stats.pl in Listing 3. This script extracts plottable X,Y data for each hour's network congestion.

Plotting the Data

GnuPlot 3.6 (http://cmpc1.phys.soton.ac.uk/gnuplot.html) will accept X,Y data files as input and produce graphs in GIF format, suitable for displaying on the Web. Given the intermediate data files generated by the two parser scripts, the GnuPlot command file in Listing 4 is a "generated" file that will act on a specific 10-day range of X,Y data. This command file builds four graphs by selecting different columns from the two data files.

The top-level script graphlogs.pl in Listing 5 puts it all together by calling the parsing scripts to generate X,Y data files and building the GnuPlot command file, culminating in the final graphs in Figures 1 and 2.

Conclusion

The ability to view continuously updated graphs reassures me that servers I administer are up to the task. Additionally, I can spot trends before they become crises, and justify requests for more resources with hard data. While DNS hits and network congestion are the two examples in this article, these techniques can be used to accumulate statistics on user logins, server load, disk space, and just about any other events occuring on a server. Tweaking the scripts is all it takes.

About the Author

Andrew Mickish graduated from Carnegie Mellon University with a degree in Computer Science. At CMU he worked on the graphical abstraction layer of early Java-like languages. He is currently an application developer and system administrator at Vanderbilt University Medical Center, supporting the delivery of patient records and other medical information directly to on-line physicians in the hospital. He can be contacted at: http://www.mc.vanderbilt.edu/~mickish/.