Article

Beyond MRTG

Bill Kramp

It's hard to tell where you should be going if you don't know where you are right now. Without reliable data about your network, it's hard to estimate your upgrade needs, and you may miss the warning signs of impending doom as a section of the network begins to saturate with traffic. In the past, when Ethernet networks were a shared medium, placing a network sniffer anywhere on the network would have allowed you to view any problem that was occurring. However, with the shift to switched 100 Mbps and Gigabit hubs, placing a sniffer between two hubs can be very difficult, or almost impossible. We must turn to the switches themselves to provide that information.

For these purposes, the Multi Router Traffic Grapher (MRTG) has been a great tool. It's simple to configure and could start collecting SNMP data and generating graphs in less than an hour after downloading. It has some limitations, however, and was really created to collect data showing traffic in and out of an interface. MRTG also regenerated the graphs after each data collection, which is CPU intensive and not always necessary. Tobias Oetiker, the creator of MRTG, recognized these limitations and created the Round Robin Database (RRD).

RRD focuses on just the storage of the data. The data is stored into pre-allocated storage areas in a binary format. This increases the performance over MRTG, which used text based files. Each piece of data is defined as a data source. These data sources can then have several storage resolutions defined, to record the averages and maximums over periods ranging from days to years. This resolution can be defined per data source and need not be imposed uniformly. A big improvement over MRTG is the ability to differentiate between data entries of zero value, and when the data is not valid (unknown). The unknown value can be entered for several reasons. A heartbeat can be defined that sets the maximum number of seconds between updates, before the “unknown” value is entered into the datasource. Not havng this feature on MRTG causes “flat-lining” on graphs of the data. If data was not collected from a device for a period of time, or MRTG was stopped for some reason, then MRTG would take the last valid values collected and use those values for subsequent sample periods when graphing the empty data values, causing a flat-line on the graph. With RRD, minimum and maximum values can also be defined for the collected data, where the unknown value will be used if the data is outside the expected range. When the data is displayed, unknown values will be shown as gaps, and not as a zero or flat-line value.

Other tools are used to collect and feed the data to RRD. These can be shell scripts, Perl, or some other language that can interface with RRD. RRD can provide graphing capability through the use of scripts as well, but I recommend using a front-end for RRD called Cricket.

Jeff R. Allen, of WebTV Networks, Inc, wrote Cricket for monitoring their network equipment. It uses several Perl scripts to collect and display the data stored in RRD. Configuring cricket is completely different from configuring MRTG, just as the storage process for MRTG and RRD are different. Cricket uses a tree-like architectureof directories to define all the devices and variables to monitor. Common information can be entered at a higher level, and devices in sub-directories will inherit that information. Once the information is defined, it is compiled into a a bash file that the collector program uses.

The collector is a Perl script that uses the hash file to tell it what devices and variables to retrieve data for, and how they should be estored in RRD. If the datasource is a SNMP device, the collector will poll the device to retrieve the data. It will then send the data to RRD, which will process it according to any specifications for minimum, maximum, or heartbeat values. Cricket and RRD can also monitor other information like disk and CPU utilization. While MRTG could monitor this type of data as well, these new tools eliminate much of the hoop jumping in scaling and displaying that type of data.

Where Cricket really shines is in the flexibility in how the data is displayed. MRTG only allowed two datasources to be displayed, but Cricket allows any number of datasources to be graphed at the same time. (However, it's a little hard to view when there are more than four or five datasources.) Each datasource can be color coded to your specification, or Cricket can automatically assign colors. Text can be defined to describe a title and y-axis for each graph, as well as legend and unit type for each datasource. The grapher CGI script that is written in Perl uses the same hash file as the collector to tell it how to display the data stored within the RRD database. Cricket displays a summary of the current, average, and maximum values for each datasource at the top of the HTML page. It then displays an hourly graph and daily graph of the data. It can also show longer duration graphs for the weeks and months. Using both RRD and Cricket provide flexibility, performance, and organization beyond that of MRTG. The tools work well together, and can be easily customized to fit the needs of your company.

Software

RRD: http://ee-staff.ethz.ch/~oetiker/webtools/rrdtool

Cricket: http://www.munitions.com/~jra/cricket

Apache: http://www.apache.org

Perl >= 5.004: http://www.perl.com

SNMP_Session: http://www.switch.ch/misc/leinen/snmp/perl

Perl Modules from CPAN at http://search.cpan.org:

Timedate
Time::HiRes
MD5 (Digest-MD5)
Net::FTP (Libnet)
HTML-Parser
MIME_Base64
URI
LWP (libwww-perl)
DB_File

RedHat Linux 6.0 already has Apache, Perl, and the Perl Module DB_File installed on the system. The following installation steps should be similar for the other versions of Linux and UNIX.

Installing Perl Modules

The first step is to install the modules from CPAN in the order that they are listed. As root, use gunzip and tar to decompress and un-archive each of the files. Make sure to read the README file for each module to check on any dependencies for other modules not listed:

# gunzip -c FILENAME.tar.gz | tar xvf -

Change into the directory for each Perl module. Make sure you have a path set to executable for Perl, then type the command:

# perl Makefile.PL

This command will check your system for required components and create a Makefile in the current directory. Next, compile the module using:

# make

The next step is optional, but I recommend that you test the module using the command:

# make test.

If everything passes the test, install the module with the command:

# make install.

Perform these last four commands for all CPAN modules.

Next, install the SNMP_Session module. It will provide an interface between Cricket and the SNMP device from which the data is being collected. It uses the same commands as the other Perl Modules listed above, but does not do any testing.

Installing RRD

Installation of RRD is next. After using gunzip and tar, create a softlink to the new rrdtool directory (in this case rrdtool-1.0.7) with the command:

# ln -s rrdtool-1.0.7 rrdtool.

cd to the new rrdtool directory and use the following commands to configure, compile, and install RRD:

# sh configure
# make
# make install

Then, cd into the RRD subdirectory perl-shared. Use the following commands to complete the installation of the RRD Perl Modules that Cricket will use.

# perl Makefile.PL
# make
# make test
# make install

Installing Cricket The code for Cricket is the last piece to install. Cricket, RRD, and the other modules do not require special privileges and can be run from any account. Create a dedicated user account named “cricket”. Log in using the new cricket account, and uncompress and un-archive the Cricket file in cricket's home directory:

# gunzip -c FILENAME.tar.gz | tar xvf -

Using the name of the new Cricket directory, create a softlink with the command:

# ln -s cricket-directory cricket

cd to the directory containing the Cricket files and run the configure script to modify the files to run on the current platform:

# cd cricket
# sh configure

Configuring Cricket The first step in configuring Cricket is to make a copy of the configuration directory while logged in as the user cricket:

# cp -r $HOME/cricket/sample-config \
	$HOME/cricket-config
# cd $HOME/cricket-config

This cricket-config directory uses subdirectories for organizing the different types of network devices to be monitored. The tree-type arrangement also allows configuration settings to be inherited by devices deeper in the directory structure. These inherited values can be overridden if required. The first file to read in each directory is the “Defaults” file. It contains settings for that directory, and any sub-directories. After the Default file is read, Cricket reads any other files in that directory. This continues on down through the directory tree. A condensed example of a config tree is shown in Figure 1, with the directories noted by “(dir)”.

The cricket-config directory contains many examples for using Cricket with different types of devices. The examples tend to organize the generic data of a device (such as temperature, memory, and CPU loads) to the directories like routers, bridges, and switches. The other directories that reference interfaces or ports contain the individual statistics for each interface on a device, but can be organized to fit the unique needs of any site.

Configuring Routers

Listing 1 shows the defaults file under the router directory for monitoring the Signal to Noise Ratio (SNR) of a Wireless 2.4 GHz KarlRouter. The Target Dictionary is the first part of the file and defines the devices that are to be monitored. The variable snmp-host must be provided with the name of the network device from which to collect the data -- in this case, a reference to another variable called “router”. The router variable could be hard-coded to one specific router, or can be automatically obtained from other files using %auto-target-name%. This will input the target names found in other files of the current directory, or sub-directories. The object identifiers (OID) needed for this device are then defined. This example lists the three for a low, good, or excellent SNR.

The Datasource Dictionary defines the information that will be used to create the Round Robin Database files. The COUNTER type informs RRD that the data will increase to a fixed point, before wrapping to zero to start over. If the value to be collected were a temperature, or a percentage, the GAUGE type would be used. A minimum and maximum limit could then be set using the rrd-min and rrd-max variables to restrict values outside those ranges from skewing the data. The heartbeat definition defines the amount of time in seconds that can pass during which data has not been collected, before the datasource is marked unknown.

The TargetType Dictionary defines the datasource for which it will be fetching data and how the information should be viewed. Multiple TargetType Dictionaries can be defined to support different types of devices. The ds variable tells Cricket to get the data for the low, good, and excellent SNR counters. A comma separates each of the datasources. The view variable then defines how the data should be displayed. The view-name “SNR Levels”, provides a description of the datasources to be viewed. To display all three datasources on the same graph, each of the datasources is listed with just a space between them. Commas are only used to separate views to allow different subsets of datasources to be graphed.

The Graph Dictionary extends control over how the data is presented. It allows colors, legends, and unit types to be assigned to each datasource. If this part is not defined, Cricket will automatically assign colors to each datasource and use default information for the other fields.

Listing 2 is the “Targets” file under the routers directory. It specifies the target for the SNMP request, which is obtained by %auto-target-name% in the Target Dictionary of Listing 1. It also defines the target type to reference in the TargetType Dictionary of the “Defaults” file and a short description to be displayed.

Accessing Interfaces or Ports

The major difference between the routers and router-interfaces directories is the use of instances and interface names. A known deficiency with MRTG was that it could only map to the instance. Many devices use the same instance every time for each interface. Some devices change instance number mapping when updates are made to the device. Cricket allows the interface name to be defined then matches the correct instance number to use.

There is a Perl script in $HOME/cricket/util called “listInterfaces”. Running this command, with specification for a SNMP device, will list the interfaces with a target name for each one:

# perl listInterfaces 3com3300a  > \
	$HOME/cricket-config/switch-ports/interfaces

The file interfaces, in the directory cricket-config/switch-ports should look something like Listing 3, with targets and interface-name definitions.

At this point, you can edit out any targets you don't want to collect data on, and change the target device name from “router” to “switch”. It is also a good time to add a description. This will appear in the right column of the Web pages that are used to view the data:

target RMON_10_100_Port_6_on_Unit_1
      interface-name  =       "RMON:10/100 Port 6 on Unit 1"
      short-desc      =       "NT Server"

target RMON_GE_Port_25_on_Unit_1
      interface-name  =       "RMON:GE Port 25 on Unit 1"
  short-desc     =       "Gigabit Fiber Uplink"

To properly monitor the network traffic of a 100/1000 Mbps switch, you need to collect more data then just bytes in and out. Collecting data on unicast and multicast traffic will help provide information on the uses of the various ports of the switches. Also, collecting data for errors on each port will help you catch problems sooner.

The Default file located in the subtree switch-ports only has datasources for octets (bytes) in and out. Instead of using the existing Defaults file, copy the Defaults file from the router-interfaces directory:

#cp $HOME/cricket-config/router-interfaces/Defaults \ 
	$HOME/cricket-config/switch-ports/.

In the new Defaults file of the switch-ports directory, the interface-name will be converted to an instance value within the Target Dictionary variable that will be used within the Datasource Dictionary:

inst    =    map(interface-name)

The snmp-host variable will need to be changed to “switch”, instead of “router”.

snmp-host       =       %switch%

The new Datasource Dictionary will be fetching additional data for errors in and out, as well as the number of unicast packets. Because most switches will also be handling multicast and broadcast packets, this information must also be collected. By defining two new datasources (ifInNUcastPackets and ifOutNUcastPackets) broadcast and multicast packets will be summed together by the switch and fetched by Cricket for RRD:

datasource ifInErrors
ds-source       =       snmp://%snmp%/ifInErrors.%inst%
datasource ifOutErrors
ds-source       =       snmp://%snmp%/ifOutErrors.%inst%
datasource ifInUcastPackets
ds-source       =       snmp://%snmp%/ifInUcastPkts.%inst%
datasource ifOutUcastPackets
ds-source       =       snmp://%snmp%/ifOutUcastPkts.%inst%
datasource ifInNUcastPackets
ds-source       =       snmp://%snmp%/ifInNUcastPkts.%inst%
datasource ifOutNUcastPackets
ds-source       =       snmp://%snmp%/ifOutNUcastPkts.%inst%

Multicasting is used for various functions -- the main use at our campus is for loading multiple systems with a single transmission using Norton Ghost. Usually, there is no multicast traffic and minimal broadcast traffic (Figure 2). A ghost load will place a large spike on a normally quiet graph (Figure 3). This can also be useful for tracking down devices that are emitting a large number of broadcast packets. If the device supports the RMON MIB, the broadcast and multicast packet counters can be collected and viewed separately for each port. Figure 4 shows the network traffic in bytes over the same time periods as Figures 2 and 3. While the multicast spikes correspond at 8:30, 10:00, and 11:00AM with Figure 3, the other spikes were multicast traffic sent to different PC labs, not displayed by Figures 2 and 3.

To add support for the NUcastPacket traffic, edit the TargetType Dictionary for both the ds and view fields:

targetType  standard-interface
            ds      =    "ifInOctets, ifOutOctets, ifInErrors,
                          ifOutErrors, ifInUcastPackets,
                          ifOutUcastPackets, ifInNUcastPackets,
                          ifOutNUcastPackets"
            view    =    "Octets: ifInOctets ifOutOctets,
                          UcastPackets: ifInUcastPackets 
                          ifOutUcastPackets,
NUcastPackets: ifInNUcastPAckets ifOutNUcastPackets,
Errors: ifInErrors ifOutErrors"

The last step is to edit the Graph Dictionary section for the NUcastPacket traffic:

graph   ifInNUcastPackets
            color      =   dark-green
            draw-as    =   AREA
            y-axis     =   “Broad/Multicast packets per second"
            units      =   "pkt/sec"
            legend     =   "Average num Broad/Multicast Packets In" 

graph   ifOutNUcastPackets
            color      =   blue
            units      =   "pkt/sec"
            legend     =   "Average num Broad/Multicast Packets Out"

The configuration files provide information for the data collection and for displaying the results of the data collected. The three main tools of Cricket are the Perl scripts: $HOME/cricket/compile for compiling the config-tree information, $HOME/cricket/collector for collecting the SNMP data, and $HOME/cricket/grapher.cgi for displaying the data. When changes are made to the config-tree, it must be recompiled. Both the collector and grapher.cgi scripts use the compiled information of the config-tree. Compile the config tree with the command:

# $HOME/cricket/compile

The next step is to run the collector to fetch the data. This is usually where errors in the cricket-config directory will occur. If a subtree is specified, such as /switch-ports, only that subtree will be compiled. This will help reduce the amount of information generated and make the process run faster while debugging. To provide more information about any problems, include -logLevel debug on the command line:

# $HOME/cricket/collector  /switch-ports  -logLevel debug

Automating the Collection and Logging Process

Cricket provides a wrapper for the data collection. It allows you to organize directories of the config-tree into groups, which can be run by cron. It also handles the rotation of the log files created each time the collector runs and emails any error messages to the owner of the cricket account:

# mkdir $HOME/cricket-logs
# cp $HOME/cricket/subtree-sets $HOME/.

Edit the subtree-sets file to call only the directories that have been correctly configured -- specifying the directory name under the directory cricket-config, and starting it with a forward slash (/) does this:

File: subtree-sets
    # $HOME is prepended unless the directories for the base and
    # logdir start with a slash
    # Location of configuration files
    base: cricket-config
    # Location for log files
    logdir: cricket-logs

    set normal:
        /routers
        /router-interfaces
    set test:
        /switches
        /switch-ports

Add the following command to crontab to start the collection of data every five minutes using the wrapper utility:

      #crontab -e
*/5 * * * * $HOME/cricket/collect-subtrees -cf $HOME/  \
  subtree-sets normal

The crontab entry tells cron to run the wrapper program collect-subtrees, using the file subtree-sets. The wrapper program then uses only the directories listed under “normal”, within the file subtree-sets. If you have enough devices to monitor, it is wise to have multiple instances of the collector running. By adding additional entries to crontab, the single subtree-sets file will allow central control of the collection process:

#crontab -e
*/5 * * * * $HOME/cricket/collect-subtrees -cf \
	$HOME/subtree-sets normal
*/5 * * * * $HOME/cricket/collect-subtrees -cf \
	$HOME/subtree-sets test

Allowing Web Access

To allow Web access to the data collected by the cricket account, complete the following steps:

# mkdir $HOME/public_html
# chmod o+x $HOME/.
# chmod o+rx $HOME/public_html
# cd $HOME/public_html
# ln -s $HOME/cricket/VERSION VERSION
# ln -s $HOME/cricket/grapher.cgi grapher.cgi
# ln -s $HOME/cricket/mini-graph.cgi mini-graph.cgi
# ln -s $HOME/cricket/lib lib
# ln -s $HOME/cricket/images images

The last step for allowing Web access using Apache is to use Linuxconf as root to allow access to the cricket directory public_html. This is done by clicking on the “add” button to add the full path to the public_html directory for the cricket account to the sub-directory specs for the Apache Web server (/home/users/cricket/public_html in this case):

Linuxconf
|---Config
    |---Networking
        |---Server tasks
            |---Apache web server
                |---Sub-directory specs:
                    /home/users/cricket/public_html

The directory also needs the features may execute CGI and may follow Symlinks enabled.

The last step to get the Web interface to Cricket running is to edit the srm.conf file for Apache. Find the line to add the handler for the .cgi extension and uncomment that line. This will allow Apache to execute grapher.cgi script to display the data:

/etc/httpd/conf/srm.conf:
AddHandler cgi-script .cgi

Viewing the Web Data Now the collector should be retrieving data every five minutes, and storing it with RRD. Using a graphical browser, connect to the Web port of the system running Cricket to the CGI script grapher.cgi:

http://server/~cricket/grapher.cgi

The main page should reflect the names of the directories located under cricket-config. Clicking on a subtree that has been configured should display the various targets and datasources that were defined. The top of the page should provide a summary of the current, average, and maximum data levels for the datasources that day. The default is to display two graphs for the hourly and daily data patterns. The long-term time ranges will display weekly and monthly graph averages of the data.

Conclusion

I hope this article has explained how to collect and analyze data from your switches and routers. Be sure to establish a baseline for your network by collecting data while you are sure the network is operating properly. This will allow you to diagnose problems on your network by referring to the data collected when the network was functioning properly. This will also provide documentation of any increases in traffic on the network and provide a basis for projecting future network upgrades.

The documentation provided with both RRD and Cricket details many more features and capabilities than covered in this article. Playing with the RRD tools will allow you to better understand how it works and provide insight as to how it interacts with Cricket.

About the Author

William Kramp is the network administrator at the Finger Lakes Community College in Canandaigua, New York. More information on RRD can be found at: http://paws.flcc.edu/~krampwd/RRD/. He currently uses NT and UNIX to perform network management and provide DNS, DHCP, firewall, and Web services. He can be reached at: krampwd@fingerlakes.edu.