Article
Figure 1

sep2001.tar

A Perl Package for Monitoring Traffic

John Shearer

Systems and network administrators constantly struggle to know what is happening on their networks. This job is difficult at best, and at worst, it can be downright exasperating. With the myriad of manufacturers, standards, protocols, and everyone thinking they have the tool that does it all, the situation is not improving. With this challenge in mind, I decided to turn to my old friends for help: Perl, SNMP (Simple Network Management Protocol), CGI, and good old-fashioned HTML.

The rtr-graph package described in this article is a set of Perl scripts for polling routers (or other SNMP-enabled devices) for information about traffic in and out of specified interfaces. You can set up "rtr-traff" as a cron job to poll the interface at a specified interval, then use a CGI script for a Web front end to the finished graphs. The Web interface automatically sorts results from different devices into separate drop-down lists. You can also set up multiple config files to poll different devices, change final graph specs, and set up new parameters. This concept was originally designed to check our Internet T1 interface for traffic levels during the day. It has since evolved into a versatile program that gathers statistics from any device to check problems, get baselines, or just see what's going on.

Rtr-graph allows the administrator to quickly gain access to traffic statistics in specific areas of the network. When trouble spots (or suspected trouble spots) arise, rtr-graph can be quickly adjusted to monitor a certain area. The Web interface allows the administrator to instantly view multiple graphs spanning several days to see trends and set benchmarks (Figure 1).

In the next several sections, I will cover the installation and use of the various Perl modules required for rtr-graph to function. Once all the required components are explained, I will cover the installation and use of the rtr-graph scripts. My current implementation of rtr-graph is based on Red Hat 7.0 with a basic RPM installation of Perl version 5.6.0, and Apache 1.3.12.

SNMP -- Simple?

The Simple Network Management Protocol (SNMP) can be anything but simple. It is a low-level protocol used to manage and monitor devices on a network. SNMP can provide an incredible amount of information about your devices, as long as you know how to obtain and decipher it. Most routers, many switches, and even end devices like workstations, servers, and printers can be managed using SNMP. The Perl implementation of SNMP that I am using is Net-SNMP, which offers an object-oriented interface to the Simple Network Management Protocol. Net-SNMP, by David M. Town, can be downloaded from:

http://www.perl.com/CPAN-local/modules/by-module/Net/

The current version as of this writing is 3.60. To unpack this Perl module, download it to a local directory and issue the following command:

tar -xzvf Net-SNMP-3.60.tar.gz

Next, change to the newly created directory and issue these commands:

perl Makefile.PL
make test
make install
Installing GDGraph

The graphing functionality of Perl requires several packages to work properly. Since hundreds of possible permutations exist in Linux installations, I will mention the packages needed and discuss some of the possible solutions (see resources section for more information).

The idea is to have a functioning installation of the Perl module GDGraph, but this module has several prerequisites. Before installing GDGraph, you must have zlib, freetype, libpng, jpeg-6b, and gd installed and functioning. Luckily, Red Hat Linux comes with all of these packages in RPM format, and they work just fine.

You will also need to download the Perl modules GD-1.32, GDTextUtil-0.80, and GDGraph-1.33 from:

http://www.perl.com/CPAN-local/modules/by-module/GD

Once you download and un-tar these archives, you can install them by going into each of the newly created directories and entering the following commands:

perl Makefile.PL
make
make install

GDGraph has the optional command "make samples", which will create a subdirectory containing many excellent examples of how to use the GDGraph Perl module if desired.

The last component to install is the ImageMagick package. This package takes the individual graphs created for each interface and uses the "Montage" function to combine them into a single image. Again, the Red Hat RPM will work just fine for this install.

The rtr-graph Scripts Once all the prerequisite modules are installed, it is time to install and look at the scripts in the rtr-graph package. Download the tar file rtr-graph-1.0.tar.gz and put it into a temporary directory. Next, issue the command:

tar -xzvf rtr-graph-1.0.tar.gz

This will create a new directory called rtr-graph and extract the scripts rtr-traff, rtr-graph, traf-lister.cgi, and the sample configuration file traffic.cfg. These files must be manually moved to their final locations: rtr-traff and rtr-graph to a bin directory such as /usr/local/bin; traf-lister.cgi to a Web cgi-bin directory; traffic.cfg to /etc. To understand how the scripts work together, let's look at the configuration file (Listing 1).

traffic.cfg is just an example and will need to be modified to suit your own needs. You will need a different configuration file for each device you want to poll. Each variable will need to be defined, but some of them can be left to their defaults. Here is a quick explanation of each of the variables:

$name -- This can be set to any name for this particular setup that will be understandable to you. This name will be used for graph names, data file names, etc. It is a good idea to name this the same as the configuration file for all features of rtr-graph to work (e.g., if the configuration file is named /etc/traffic.cfg, set this value to "traffic").

@interfaces -- This is a list of the interfaces on the device to be polled. You may need to use some other SNMP management software to know what the interface indexes are, but they usually follow the pattern of 1, 2, 3, etc. You also have the option of adding a colon-delimited value for the graph y-axis for just this interface.

$y_axis_def -- This is the default value in megabits for the graph y-axis. This value is overridden by any values specified in the @interfaces list. Again, this value is in megabits (e.g., a value of 10 will give the y-axis a value of 10,000,000).

$tmp_dir -- The directory to be used for the data files and some temporary files needed by the rtr-graph script. This directory will need to be writable by the executor of the scripts and the httpd daemon in order for all the functions of the Web interface to work.

$www_dir -- The directory that will store the finished graphs. Again, this will need to be writable by the executor of the rtr-graph scripts and the httpd daemon.

$offset -- This is the number of seconds to offset the date for which a graph is generated, if no date is given. For instance, a value of 86,400 seconds means that, if no date is given, the default is to generate a graph for 86,400 seconds ago, or "yesterday".

$comm -- The Read community string for the device being polled, usually "public".

$address -- The IP address or resolvable name of the device to be polled.

$tile -- This is the grid size of the final graph specified as "WIDTHxHEIGHT". If you were polling one interface then you would specify "1x1". If you are polling eight interfaces, then you could specify "8x1", "4x2", etc. This depends mostly on how you would like to see the final display.

$skip -- This value specifies how often the graphing program should insert an x-axis value. If you poll the device every minute, trying to show every minute as an x-axis value would be quite unreadable. I find these values to work fairly well:

1-minute poll -- $skip = 50
5-minute poll -- $skip = 10
10-minute poll -- $skip = 5

$gr_width and $gr_height -- These values specify the width and height of the individual graphs. Again, this is user preference and is highly dependent on how many interfaces you are polling and the resolution of the display where you will be viewing the graphs.

rtr-traff Script

I will now explain the rtr-traff script (Listing 2). rtr-traff is the mechanism that polls the device for the number of octets sent and received from each interface. This script takes one required argument, which is the name of the configuration file to be used. This configuration file should be specified as just the file name, because the script will assume it is in the /etc directory. Here is an example cron entry that will poll the router every five minutes:

*/5 * * * * /usr/local/bin/rtr-traff --cfg traffic.cfg

Using the Net::SNMP Perl module, rtr-traff creates an object for the device to be polled as found in the configuration file. It then polls the two MIB objects specified as $inmib and $outmib with the actual interface index appended to the end of each object. These values are stored in a data file (the location specified in the configuration file) along with the time in seconds since epoch. The data file name is derived from the $name variable, the current date, and the interface index. This allows for each interface's data to be stored in its own file. Because the data file name contains the date, the files are automatically rotated every day.

rtr-graph Script

Next comes the rtr-graph script (Listing 3), which is the real workhorse of the system. This script can be broken down into four basic components: 1) startup and setting of variables; 2) processing the data collected into a more usable format; 3) making the small graphs for the individual interfaces; 4) generating the final picture and do some cleanup.

Let's consider these components one at a time. Most straightforward is the setup where rtr-graph sets variables and offers help if requested. It also checks for command-line arguments, such as the required configuration file. We also have an option of specifying a date for the graph. This gives us the option of re-creating graphs for previous days (as long as we still have the data files) and creating graphs on the fly for the current day. There is a built-in option for specifying "today" as a valid date, so you don't have to figure out today's date yourself. This makes it a little easier to use "rtr-graph" in other scripts without having to write code for date calculation.

Next, rtr-graph parses through each data file and puts the polled information into a graphable format. One interesting SNMP problem is that the values for ifInOctets and ifOutOctets are stored in octets (bytes) instead of bits, which is what most people are used to. This is handled by the $scale variable which is set to .125 (or 1/8) to give a more readable value. This could also be set to 1 for bytes, 128 for kilobits, 1024 for kilobytes, etc. Just remember to change your graph labels to reflect this! Another SNMP issue to overcome is the fact that integers stored in a MIB counter have a 4- byte maximum value, or 4,294,967,294. Even on a 1.5-Mb T1, this counter can roll over several times a day. To handle this, the script checks whether the current value it is processing is less than the last value it processed (meaning a rollover has occurred) and uses this maximum to calculate the actual data throughput during that period.

rtr-graph is also able to accurately graph using variable sample periods. It uses the time stamps to see how many seconds passed between samples, and then divides the traffic by that value. This solves two potential problems. First, the administrator can set different sample rates for different devices, or change the sample rate on a device, while being assured he is getting an accurate bits-per-second reading; Second, if the rtr-traff script misses a reading (which can happen for any number of reasons), the graphing script will just treat the gap as a long sample and still give an accurate reading.

Now I will use the functions of GDGraph to make the individual graphs for the interfaces. The documentation that comes with GDGraph is fairly clear about how to create a variety of different graphs, and the samples provided are excellent. I simply feed the newly created graph object the data just compiled, along with a few pre-set variables for labels, graph size, etc. These individual graphs are stored in png format in the temp directory specified in the configuration file. I used to use the GIF format for my graphs, but during a re-install some time ago, I found this excerpt from the GD package README file:

===> This version of GD no longer supports GIF output because of 
===> threats from the legal department at Unisys. Source code
===> that calls $image->gif will have to be changed to call either 
===> $image->jpg or $image->png to output in JPEG or PNG formats. 
===> The last version of GD that supported GIF output was version 
===> 1.19.

In the final step, rtr-graph uses the Montage function of ImageMagick to compile the individual graphs into a single graphic. Here the $tile setting from the configuration file comes into play. If you are polling one, two, or even three interfaces, a single row of graphs will be okay. I pushed the potential of this system and polled all 32 interfaces of a high-end router we were using. It worked like a champ, but that's a lot of graphs to try to look at on the screen! In this case, I actually opted for a $tile value of "32x1". Even though this created a final image of 9600 pixels across, I found it easier to get comparisons over several days by viewing one day per row. YMMV.

As part of the final step, rtr-graph removes the temporary graphs and sets the owner and group permissions of the final image to root. This is necessary because the Web interface can generate the graphs on its own, which will give the image the owner and group permissions of the httpd daemon. Even if rtr-graph re-generates the graphs during its regular execution, the ownership will not automatically change to root (or whoever the cron job runs as) if the image already exists. This step is not absolutely necessary, but I feel better knowing that the Web daemon (or rogue Web processes) couldn't alter or remove the images once they were written.

Finally, this script needs to be run at a regular interval just like the rtr-traff script. An example cron entry to execute this script every morning at 12:30 might be:

30 00 * * * /usr/local/bin/rtr-graph --cfg traffic.cfg

Web Interface

The final piece to this system is the Web front-end for viewing the graphs. This script (traf-lister.cgi -- Listing 4) needs to be put in a cgi-bin directory on your Web server. I put this script in a separate directory from my public scripts in order to maintain some security. I'm not sure how this information could necessarily be used maliciously, but the directory was already set up for other administrative scripts, so it seemed like the logical place to put it.

Because the traf-lister.cgi is not necessarily dependent on any specific rtr-graph configuration file, some variables must be hard-coded directly into the script. Those variables are:

$www_dir -- This is the full path to the directory where the graphs are stored.

$cgi_dir -- The directory where this script resides. This is most likely an alias that is defined in your Web server configuration files (e.g., httpd.conf in Apache).

$bin_dir -- The directory where the rtr-graph script is stored. This is necessary if you want to be able to generate current graphs from the Web interface.

$url -- This is the full URL to the directory where the traffic graphs are stored. Again, this may be an alias defined in your Web server configuration.

I tried to make the Web interface as intuitive as possible without cluttering it up with a lot of unnecessary text and instructions (Figure 1). Basically, each graph that is stored in the $www_dir directory (designated by the .jpg extension) is placed in a list based on its prefix, which is defined by the $name variable in the configuration file. Radio buttons correspond to each list, so you can generate a new graph for that device. If you try to graph too many devices with too many interfaces, you may encounter a time-out situation on your browser. This has never happened to me, but the possibility exists.

To view the finished graphs, simply select as many images from as many lists as you want and hit the SUBMIT button. Most browsers support the [CTRL]-Click and [SHIFT]-Click options for selecting multiple items in a list. The traff-lister.cgi script will also remember your last selections; so, to remove them from your next SUBMIT, select the auto-generated "No Graph" entry. All the scripts in this package were designed with configuration simplicity in mind. This allows an administrator to make a logical separation in the devices monitored for easier viewing. For instance, a number of printers can be polled and the graphs stored in /www/html/printers, and a number of routers can be polled and stored in /www/html/routers. Then, two different viewer scripts can be configured, so you can quickly jump to just the devices you are looking for.

Conclusion

While rtr-graph cannot replace all commercial network analysis products, it does offer an easy, automated way to view particular devices that are potential trouble spots. With its ease of configuration, an administrator can easily customize the output. Figure 1 shows an Internet router with one Ethernet interface that is being graphed with a 10-Mb scale. It also shows two T1 interfaces that are graphed with a 1.6-Mb scale (a T1 actually has a maximum of ~1.54 Mb, but 1.6 Mb makes the graph easier to view). With the separation of "Traffic In" and "Traffic Out", we can also see exactly how the data is moving through this device.

A nice advantage rtr-graph has over some other products is the ability to view the statistics of several interfaces at the same time. Not long after we put our load-balanced T1's into production, we saw that even though they looked like they were balanced, the Ethernet interface on the router showed we were still only getting single T1 performance and that the traffic on the T1's were actually mirrored instead of balanced. Without being able to see the T's and the Ethernet interfaces at the same time, the problem may have gone unnoticed for quite a while. (Figure 1 shows an example of this discrepancy.)

There's one last bit of maintenance to perform. Your temp directory can quickly fill up with data files if you're not careful. I like to keep these files around for a few days just in case, but I have never needed them beyond that. A useful program for handling this problem is the tmpwatch utility, which can remove old files from any directory you specify. Red Hat systems automatically run this utility daily to remove files from the /tmp directory that are more than 10 days old.

Where Do We Go from Here?

A logical step for future implementation would be to use an SQL database to store all the polled information rather that cluttering up the system with text data files. As long as you're doing that, you might as well poll for other types of information like errors, broadcasts, etc. Most of the work has already been done, you would just need to add the new MIBs to the polling script and send the collected data to your database instead of a text file. Then, you could make your Web interface so you could select only the information and time period you want and have it create a custom graph on the fly!

Resources

Net-SNMP -- http://www.perl.com/CPAN-local/modules/by-module/Net/

zlib -- http://www.info-zip.org/pub/infozip/zlib/

freetype -- http://freetype.sourceforge.net/

libpng -- http://freesoftware.com/pub/png/src/

jpeg-6b -- ftp://ftp.uu.net/graphics/jpeg/

gd -- http://www/boutell.com/gd/

GD, GDTextUtil, GDGraph -- http://www.perl.com/CPAN-local/modules/by-module/GD/

ImageMagick -- http://www.imagemagick.org/

John Shearer helps manage about 2000 laptops and desktops at a boarding high school in Massachusetts. Although he comes from a primarily Dos/Windows background, he has learned of the wonderful world of Linux and devours anything Linux-related that he can get his hands on. When he's not putting the final touches on a new script, he loves to chase around his two-year-old son. He can be reached at: jshearer@nmhschool.org.