Cover V11, I10

Article
Figure 1

oct2002.tar

Log Management Using Apache::DBILogger

Chip Castle

Many log-analyzer applications are available on the Web, but most are limited to specific predefined types of functionality. Many of these applications gather file transfer specifics from Apache's flat-file access log, which can confine the end user to a narrow scope of data-mining activities. I needed a much more flexible solution, which would allow me to add extra functionality in a scalable, customizable fashion. Thus, I installed the Apache::DBILogger module, which logs all incoming http requests on your server to a MySQL database.

Apache::DBILogger Configuration

I installed Ask Bjoern Hansen's Apache::DBILogger module, available from CPAN (http://cpan.perl.org), which stores all incoming http requests that Apache handles into a MySQL database rather than the conventional Apache access log. For this solution to work seamlessly, be sure Apache is configured with the mod_perl module (available at: http://www.apache.org). I usually use the script available at: http://www.apachetoolbox.org to help me configure Apache with all the modules I've selected, including downloading all necessary files to complete the setup.

Once Apache has been configured with mod_perl, you can install Apache::DBILogger, which can easily be done as root by issuing the following command at the shell:

# perl -MCPAN -e 'install Apache::DBILogger'
The module and all of its dependencies, which include Date::Format and DBI, should install without a hitch. If they don't, simply download them from CPAN and install them by hand.

Database Prep Work

The next step is to create a table for storing data. You'll need MySQL, which is available at http://www.mysql.com. Retrieve the latest stable RPM or tarball, depending on your system and installation preference. On my system, I can start the MySQL server by becoming root and issuing the following command:

# /usr/bin/safe_mysqld &
Next, log into the MySQL client console and create the httpdlog database and the requests table as follows:

mysql> CREATE DATABASE httpdlog;
mysql> \u httpdlog;
mysql>  CREATE TABLE requests (
server varchar(127) DEFAULT '' NOT NULL,
bytes mediumint(9) DEFAULT '0' NOT NULL,
user varchar(15) DEFAULT '' NOT NULL,
filename varchar(200) DEFAULT '' NOT NULL,
remotehost varchar(150) DEFAULT '' NOT NULL,
remoteip varchar(15) DEFAULT '' NOT NULL,
status smallint(6) DEFAULT '0' NOT NULL,
timeserved datetime DEFAULT '0000-00-00 00:00:00' NOT
NULL,
contenttype varchar(50) DEFAULT '' NOT NULL,
urlpath varchar(200) DEFAULT '' NOT NULL,
referer varchar(250) DEFAULT '' NOT NULL,
useragent varchar(250) DEFAULT '' NOT NULL,
usertrack varchar(100) DEFAULT '' NOT NULL,
KEY server_idx (server),
KEY timeserved_idx (timeserved) );
Whew! That was a lot of work and could certainly be an error-prone process, so I recommend using the example SQL table definition in the POD documentation for Apache::DBILogger, found by issuing the following command:

$ perldoc Apache::DBILogger
You can store the table definition in a file for later use and run a command from the shell to create the database and table in MySQL. For example, I stored the commands in a file called httpdlog.sql and issued the following command as root:

# mysql httpdlog -p < httpdlog.sql
<enter password>
Once you have the proper table definition for storing data, you can configure Apache to send the data there instead of the access log.

Apache Configuration

On my system, Apache is located in /usr/local/apache/, so I would edit /usr/local/apache/conf/httpd.conf and add the following directives at the beginning of the file:

PerlModule Apache::DBI
PerlModule DBI
PerlModule Apache::DBILogger

# Place this in your Apache's httpd.conf file
PerlLogHandler Apache::DBILogger

PerlSetVar DBILogger_data_source    DBI:mysql:httpdlog
PerlSetVar DBILogger_username       httpduser
PerlSetVar DBILogger_password       secret
PerlSetvar DBILogger_table          requests
For the configuration changes to take effect, restart Apache like so:

# /usr/local/apache/bin/apachectl graceful
I prefer passing the "graceful" argument to apachectl because it will clearly restart Apache if it's already running, or effectively start Apache if no processes currently exist. You can test this setup thus far by firing up your favorite browser and accessing various documents on your site. First, make several accesses to ensure you have data in the new table. Second, log into MySQL and query the database as follows to confirm data is being stored as expected:

# mysql httpdlog -p
<enter password>
mysql> SELECT * FROM requests;
If you see data, congratulations! You can now start using the script described below. In addition to the modules in the script, the DBD::mysql module is necessary for communicating with the MySQL server. This module can easily be installed by issuing the following command as root:

# perl -MCPAN -e 'install Bundle::DBD::mysql
Code Talk

The next step involves the script responsible for performing the querying and reporting duties. Although the script could've been written in any language capable of connecting to MySQL, such as PHP, Python, or many others, I chose to write it in Perl, a long-time personal favorite for this type of task:

  1 #!/usr/bin/perl
  2 #
  3 # Author:
  4 # Chip Castle
  5 # Chip Castle Dot Com, Inc.
  6 # http://www.chipcastle.com
  7 # chip@chipcastle.com
  8 #
  9 $|=1;
 10
 11 #use strict;
 12 #use diagnostics;
 13 use DBI ();
 14 use CGI qw(:standard center end_table);
 15 use CGI::Carp qw(fatalsToBrowser);
 16 use URI::Escape;
 17
 18 use constant DSN              => 'dbi:mysql:httpdlog';
 19 use constant DB_TABLE         => 'requests';
 20 use constant DB_AUTH          => 'httpduser:secret';
 21 use constant BG_COLOR         => 'white';
 22 use constant TABLE_BG_COLOR   => '#cccc99';
 23 use constant TABLE_DEF        => '<TABLE BORDER="1" CELLPADDING="7" CELLSPACING="1">';
 24 use constant LIMIT            => 10;
 25 use constant TITLE            => 'Access Log Viewer';
 26
 27 print header;
 28
 29 &display_form;
 30 &display_report;
 31 print('</center>' . end_form() . end_table() . end_html());
 32
 33 sub display_form {
 34     my %labels = (
 35       "Choose a report"       => "",
 36       "Total hits"            => "SELECT COUNT(*) FROM requests",
 37       "Number of bytes transferred"  => "SELECT SUM(bytes) FROM requests",
 38       "Most popular url's"    => "SELECT filename,COUNT(*) as count FROM requests GROUP BY filename 
                                      ORDER BY count DESC",
 39       "Broken links"          => "SELECT filename,COUNT(filename) FROM requests WHERE status=404 
                                      GROUP BY filename",
 40       "Most popular MIME type served" => "SELECT contenttype,COUNT(*) as count FROM requests GROUP 
                                               BY contenttype ORDER BY count DESC",
 41       "Largest documents served"      => "SELECT filename,bytes FROM requests ORDER BY bytes DESC",
 42       "Most recently served url's"    => "SELECT filename, timeserved FROM requests ORDER BY 
                                               timeserved DESC",
 43     );
 44     my %reversed_labels = reverse %labels;
 45     print start_html(-bgcolor=>"${\BG_COLOR}", -title=>"${\TITLE}"),
 46           '<center>',
 47           br,
 48           br,
 49           b(font({-size=>"+2", -face=>"arial,helvetica"}, "${\TITLE}")),
 50           br,
 51           br,
 52           br,
 53           br,
 54           start_form(-action=>url()),
 55           font({-size=>"+1", -face=>"arial,helvetica"}, "Select Report:"),
 56           &spaces(3),
 57           popup_menu(
 58             -name    => 'canned_report',
 59             -values  => [ values %labels ],
 60             -default => "",
 61             -labels  => \%reversed_labels,
 62           ),
 63           &spaces(3),
 64           font({-size=>"+1", -face=>"arial,helvetica"}, "or"),
 65           br,
 66           br,
 67           font({-size=>"+1", -face=>"arial,helvetica"}, "Enter SQL Query:"),
 68           &spaces(3),
 69           textfield(-name=>'sql', -size=>75),
 70           &spaces(3),
 71           br,
 72           br,
 73           submit(-name=>'submit', value=>'Display Results'),
 74           br,
 75           br;
 76 }
 77
 78
 79 sub display_report {
 80     my $sql = param('canned_report') ne '' ? param('canned_report') :
 81               param('sql') ne '' ? param('sql') : "SELECT * FROM requests";
 82
 83     my $num = defined param('num') ? param('num') : 0;
 84     my $limit_sql = $sql . " LIMIT $num, ${\LIMIT}";
 85
 86     my ($dbh, $ra, $row_count, $results, $sth);
 87
 88     $dbh = DBI->connect(DSN, split ':', DB_AUTH, {-RaiseError=>1}) or die("ERROR: $DBI::errstr");
 89     eval {
 90         $ra = $dbh->selectall_arrayref($sql);
 91         $row_count = scalar(@$ra);
 92         $sth = $dbh->prepare($limit_sql);
 93         $sth->execute();
 94     };
 95     die($@) if ($@);
 96
 97     $results = "result" . ($row_count > 1 ? 's' : '');
 98
 99     print hr({-width=>"80%"}),
100           br,
101           br,
102           font({-face=>"arial,helvetica"},
103           "Found $row_count $results for the following query: <BR><BR>$limit_sql"),
104           br,
105           br,
106           ("${\TABLE_DEF}");
107
108     my $url = url() . '?sql=' . uri_escape($sql);
109     my $prev_url = $url . '?m=' . ($num - ${\LIMIT} > 0 ? ($num - ${\LIMIT}) : 0);
110     my $next_url = $url . '?m=' . ($num + ${\LIMIT});
111     $prev_url = ($num + 1 - ${\LIMIT} > 0) ?
112                 a({-href=>$prev_url}, "<< Previous") . &spaces(10) : '';
113     $next_url = ($row_count - $num > ${\LIMIT}) ?
114                 a({-href=>$next_url}, "Next >>") : '';
115     print &get_nav_table($prev_url, $next_url);
116
117     print br, "${\TABLE_DEF}";
118     my $rows = 0;
119     while (my $rh = $sth->fetchrow_hashref()) {
120          print(Tr({-bgcolor=>"${\BG_COLOR}"}, th([keys %$rh]) )) if $rows == 0;
121          $rows++;
122          my $curr_cell_color = ($rows % 2 == 1) ? "${\TABLE_BG_COLOR}" : "${\BG_COLOR}";
123          print(Tr({-bgcolor=>$curr_cell_color}, td({-nowrap=>1}, [values %$rh]) ));
124     }
125     print end_table, br, &get_nav_table($prev_url, $next_url);
126
127     $sth->finish;
128     $dbh->disconnect if defined $dbh;
129 }
130
131 sub get_nav_table {
132     my ($prev_url, $next_url) = @_;
133
134     my $nav = '<table border = "0" width = "80%">' .
135               Tr(td({-align=>'left'}, [ font({-face=>"arial,helvetica"}, $prev_url)]),
136                  td({-align=>'right'}, [ font({-face=>"arial,helvetica"}, $next_url)])) .
137               end_table();
138     return $nav;
139 }
140
141 sub spaces {
142     my $num = shift or 1;
143     my $spaces;
144     for (1..$num) { $spaces .= ' '; }
145     return $spaces;
146 }
147
Lines 1-16 cover pointing to the correct location of the Perl interpreter on my system, turning off output buffering, and pulling in various modules necessary to complete the task at hand. For performance reasons, both strict and diagnostics modules should be commented out during production use. However, they are an excellent resource for all of my development tasks that aren't of the quick-and-dirty variety. The DBI module allows the script to connect to MySQL and pass queries to the database with ease. The CGI module provides various subroutines for accessing variables via HTTP GET and POST methods, header() for printing the proper "text/html" MIME content-type, and numerous convenient HTML shortcut routines. CGI::Carp's fatalToBrowser() subroutine provides the ability to redirect STDERR to STDOUT, so any time the script calls die(), an error message is displayed in the browser. URI::Escape, the last module pulled in, is necessary for encoding the navigational hyperlinks to display next and previous pages.

Lines 18-25 define several constants for the customization of the script and the database connection. Note the DSN, DB_TABLE, and DB_AUTH variables because they describe the name of the database and table in addition to the username and password for accessing them. If you created the httpdlog database and requests table as in the above example, the only remaining task is to set the proper username and password combination for your system in DB_AUTH.

Lines 27-31 are for printing the MIME header, which defaults to "Content-type: text/html", as well as calling the subroutines for displaying the HTML form and the report results. Since display_form() and display_report() are only called in one place, I could have eliminated the calls and declarations of these subroutines to run in standard, sequential shell scripting style.

On lines 34-43 of display_form(), the %labels hash is set to map the SQL commands to the report names. The first item in the hash is set to an empty string for the "Choose a report" label to ensure that the canned_report popup_menu() doesn't have a predefined value set. As shown in Figure 1, the user has the option to enter his own query in the "Enter SQL Query" text field if none of the canned reports suit his needs. The next item in the %labels hash, SELECT COUNT(*) FROM requests, will display the total number of hits the site has received since installing the Apache::DBILogger module. Remaining items in the hash cover other notable reports such as "Number of bytes transferred", "Most popular URLs", "Broken links", "Most popular MIME type served", "Largest documents served", and "Most recently served URLs". Of course, systems administrators will have different needs for the sites they manage, so assembling additional queries for your site will require some ingenuity.

After the queries have been defined, it is time to print out the various form elements that make up the access log front-end Web interface. In line 44, the entries in %labels are reversed to ensure popup_menu() displays them in the proper fashion. Line 45 starts a lengthy print statement for printing out the form. A few globals such as BG_COLOR and TITLE, as well as various HTML shortcuts, are used for cosmetic purposes. In general, the form's layout can be customized to suit the end user's desired look and feel. On line 54, the start_form() subroutine's action attribute is a call to url(), which is the name of the CGI script, so it can call itself upon subsequent form submissions.

Note on line 56 the spaces() subroutine, which is merely a concise way of returning white space suitable for HTML, and will default to only one space if no argument is given. Lines 57-62 cover the canned_report variable's popup_menu(), which is passed a reference to an array of the SQL queries listed in the %labels hash above. The default in popup_menu() is not set so the menu will initially display the "Choose a report" label. This will allow the end user to fire up this script and execute his or her own custom SQL query without being forced to select a predefined report. The rest of this routine does more HTML formatting and then the sql form textfield variable is created, which will be used to pass custom queries. If you're entering the same queries into the sql textfield often, you can edit the script and add the query to the canned_report popup_menu(). Finally, the subroutine ends with more formatting and a form submit button.

The display_report() subroutine, starting on line 79, is where the real work is done in this script, though. Lines 80-81 set $sql, which is done by checking for a canned_report variable being passed in. If the variable is not empty, $sql is set to its value. Otherwise, the sql textfield variable is checked and $sql is set to its value if it is not empty. As a default, $sql is set to display all of the records in the requests table in the event that neither form variables contain values.

Lines 83-84 set the $num variable, which is used for navigational purposes. However, if no value is present, it is set to 0. The $limit_sql variable is used to limit the number of rows of a query result set being displayed, which is done by using SQL's LIMIT keyword and then providing a row number and offset indicating the number of records to be displayed. I set the LIMIT global variable to 10, but this can be changed to suit specific needs.

Lines 86-95 is where the majority of the back-end database operations take place. Several variables are created to handle database queries. We then attempt to connect to the database using the DSN (data source name) and DB_AUTH settings, with the latter being specific to your site's database configuration. The script dies if no database connection can be made, since we can't query the database without a connection. Note that the connect method of DBI passes in a RaiseError attribute which is set to 1. This will cause the script to die if any problems occur during the querying operations and also alleviates the need to place a call to die() after every DBI-related method. The queries are wrapped up in a call to eval() to trap any errors that may occur. Line 95 checks the $@ variable, which will be set if any errors occur during the call to eval(). If this happens, the script subsequently calls die().

Lines 97-106 display a message describing the query that was passed to the server with several of CGI.pm's HTML shortcut functions. Lines 108-115 perform the navigational duties of the script by assembling a query string that is made up of the SQL to be passed to the server and the offset within the query's result set that should be displayed. The $next_url and $prev_url variables are only set if there are records that can be displayed on pages other than the current one. They are passed to get_nav_table() on lines 131-139, which returns a nice HTML table containing the navigational links, and is subsequently printed above and below the rows containing the results of the query. Lines 118-124 fetch the rows of the query result set and print them into a nicely formatted table using CGI.pm's HTML shortcuts. For a nice visual effect, the color of the table rows alternate between white and tan. Lastly, the table is finished and the database connection is dropped.

Conclusion

Now that the pieces are in place, we can see how this application could benefit systems administrators wanting to gauge their Web server activity more clearly. Using tools such as sed, awk, and Perl to parse the flat-file implementation of the access log and extracting useful information is not as necessary as it once was. More importantly, this application could be extended to include more elaborate queries for gathering site-specific information. It could include a plethora of other features, such as results graphing, log rotation, and report storage capabilities. My hope is that you can use Apache::DBILogger and the script provided for managing your server data more effectively!

All code for this article is available for download at: http://www.sysadminmag.com/code/.

Chip Castle is a Web/database consultant specializing in Open Source software. He can be contacted at: chip@chipcastle.com.