Web Cache Proxy Effectiveness
The Web cache proxy effectiveness is the measurement of the cache
hit rates and any resulting traffic and time savings. There are
numerous tools available to analyze the information within Web cache
proxy logs (e.g., Bradford Barrett's highly recommended Webalizer).
However, for our purposes, we wrote a Perl script utilizing Thomas
Boutell's GD graphics library (via Lincoln Stein's GD
Perl module). Our interest was in the number of accesses, the volume
of the accesses, and the elapsed time of accesses. An example of
the output for the month of October 2001 is shown in Figure 6.
The proxy logs (in this instance from squid) provides information
such as the timestamp of access, the client IP address, the client
user id (if you are using some type of authentication), the URL
access, HTTP return code, proxy return code, bytes downloaded, and
transaction time in milliseconds. We have calculated the total number
of accesses, the volume of accesses, and the transaction time of
the accesses from these logs.
The other information, which can be customized to your requirements,
includes further breakdowns based on other criteria such as the
URL, country, TLD types, HTTP content types, and user id. This information
can be utilized to improve your redirection or pre-fetching tables
to further increase the cache hit and savings.
The Perl script (Listing 1) requires the squid format log files
from standard input and generates the HTML report to standard output.
(Listings for this article are available from the Sys Admin
Web site: http://www.sysadminmag.com.) Additionally, the
graphic PNG format files are generated in the current directory.
Hence, to utilize this script, change the directory to the Web server
directory and then pipe the squid access logs to the script and
redirect the output into a .html file. For example:
$ cd /opt/squid/htdocs
$ cat ../logs/access.log | ../bin/PerfStat.pl > perf.html
The above command will produce a report file (perf.html) with reference
to the PNG image files: CHpie.png, CVpie.png, DHpie.png, DVpie.png,
PApie.png, PVpie.png, THpie.png, TVpie.png, RThist.png and VDhist.png,
which are generated in the current working directory.
In our circumstances, we run a cron job (Listing 2) to feed the
proxy logs each day through the program and generate accumulative
statistics and roll them over at the end of each month. The cronjob
shell script renames the files (adding the yyyymm extension), as
well as editing the .html report file, replacing the image file
references with the new file names.
The first part of the Perl script reads from standard input the
squid log format records and then also holds in various variables,
arrays, and hashes the access counts, byte counts, and time counts
categorized in a number of different ways. After all the input has
been read, it then writes out the calculations in HTML format tables
to standard output. References (embedded links) to PNG graphic image
files are also included in this output. The last part of the Perl
script loads the graph data into arrays and the GD modules are called
to write the graph into the nominated files. Prior to running the
Perl script, the reader may need to lead the GD Perl module. The
GD Perl module also requires the installation of the GD graphics
library and this requires both the libpng and zlib
libraries. Otherwise, the graphing code can be commented or stripped
out and the Perl script would simply produce the information in
HTML table format.
|