Web Cache Proxy Effectiveness

The Web cache proxy effectiveness is the measurement of the cache hit rates and any resulting traffic and time savings. There are numerous tools available to analyze the information within Web cache proxy logs (e.g., Bradford Barrett's highly recommended Webalizer). However, for our purposes, we wrote a Perl script utilizing Thomas Boutell's GD graphics library (via Lincoln Stein's GD Perl module). Our interest was in the number of accesses, the volume of the accesses, and the elapsed time of accesses. An example of the output for the month of October 2001 is shown in Figure 6.

The proxy logs (in this instance from squid) provides information such as the timestamp of access, the client IP address, the client user id (if you are using some type of authentication), the URL access, HTTP return code, proxy return code, bytes downloaded, and transaction time in milliseconds. We have calculated the total number of accesses, the volume of accesses, and the transaction time of the accesses from these logs.

The other information, which can be customized to your requirements, includes further breakdowns based on other criteria such as the URL, country, TLD types, HTTP content types, and user id. This information can be utilized to improve your redirection or pre-fetching tables to further increase the cache hit and savings.

The Perl script (Listing 1) requires the squid format log files from standard input and generates the HTML report to standard output. (Listings for this article are available from the Sys Admin Web site: http://www.sysadminmag.com.) Additionally, the graphic PNG format files are generated in the current directory. Hence, to utilize this script, change the directory to the Web server directory and then pipe the squid access logs to the script and redirect the output into a .html file. For example:

$ cd /opt/squid/htdocs
$ cat ../logs/access.log | ../bin/PerfStat.pl > perf.html

The above command will produce a report file (perf.html) with reference to the PNG image files: CHpie.png, CVpie.png, DHpie.png, DVpie.png, PApie.png, PVpie.png, THpie.png, TVpie.png, RThist.png and VDhist.png, which are generated in the current working directory.

In our circumstances, we run a cron job (Listing 2) to feed the proxy logs each day through the program and generate accumulative statistics and roll them over at the end of each month. The cronjob shell script renames the files (adding the yyyymm extension), as well as editing the .html report file, replacing the image file references with the new file names.

The first part of the Perl script reads from standard input the squid log format records and then also holds in various variables, arrays, and hashes the access counts, byte counts, and time counts categorized in a number of different ways. After all the input has been read, it then writes out the calculations in HTML format tables to standard output. References (embedded links) to PNG graphic image files are also included in this output. The last part of the Perl script loads the graph data into arrays and the GD modules are called to write the graph into the nominated files. Prior to running the Perl script, the reader may need to lead the GD Perl module. The GD Perl module also requires the installation of the GD graphics library and this requires both the libpng and zlib libraries. Otherwise, the graphing code can be commented or stripped out and the Perl script would simply produce the information in HTML table format.