Cover V11, I02

Article
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Sidebar

feb2002.tar

How Web Cache Proxies Do and Don't Save on Internet Costs

Stanley Wong

Web cache proxies are often promoted as one means to save network download costs. In my experience, they have had only a small influence on the network volume and, at first glance, the return on investment (ROI) is not worthwhile. However, after further analysis, the major contribution of Web cache proxies to the balance sheet is in productivity costs to the organization. The total cost of ownership (TCO) of Internet services is adversely affected in the absence of a Web cache proxy. I've provided a sample Perl script that can be used to analyze Web cache proxy effectiveness (as opposed to the cache proxy hardware/software performance).

Network Traffic and Costs

The Internet is a freely available resource for information, databases, and communication. Most medium to large commercial organizations have connected their corporate LANs to the Internet in some way. They usually provide Internet access for their employees, too. This access comes at a cost and, within Australia, it is dependent on the volume of traffic downloaded. It is in the interests of the businesses to control this expense and one of the most popular means to do so is via use of a Web cache proxy server. In this article, I examine the way Web cache proxy servers do and don't reduce the expense of Internet connection.

When admins think about Internet access and services, they think of services like telnet, FTP, NNTP, and NTP. However, the majority of Internet users think of the Web and email and, not surprisingly, these two services represent the majority of traffic to and from the Internet. For example, in our organization, typically 50% of the inbound traffic is Web traffic while a third is email traffic. Unfortunately, there is currently not much you can do with inbound email traffic, but correctly configured Web cache proxy servers are said to be able to save 30-60% of the Web traffic. In my company, we could potentially save 15-30% of our Internet traffic.

Our organization has a high-speed link to a major ISP and we have 1500 employees, the majority of whom have access to the Internet for Web browsing. Typically, we download 60 GB per month at a cost between $110 and $170 per GB, and the bill from our ISP totals around $16,000 per month (including fixed service fee and GST). If we were to believe the figures above, it would mean that without our cache proxy we would be spending an extra $1,500 - $4,000 per month (10-25 GB) in usage charges. This is more than enough to justify purchase and maintenance of one Web cache proxy. Though in actuality, the extra download could not be the case because our bandwidth would be saturated, limiting the throughput. See Figure 1.

Web Cache Proxy

Web proxies log each access from every client and the cache proxy will first attempt to source the Web pages from those it has cached locally in memory or storage (disk). Otherwise, it will retrieve the page from the original source on the Internet. When these logs are examined, we find that approximately 45% of the accesses were cached and, thus, only in 55% of accesses need to go to the Internet. (See Figure 2.) However, ISPs don't charge on the number of accesses; they charge on volume downloaded.

When the proxy logs are examined and the byte counts are collated, we find that the 45% of cache accesses only account for 15% of the volume. (See Figure 3.) In other words, 85% of the download volume is not cached. Thus the saving in network usage charges with Web cache proxies represent only 7.5% of the total (5 GB or $750 per month, in our case). With current figures, the cost savings in having a Web cache proxy server means the ROI is four years (assuming $22,000 server/appliance and $200 per month maintenance). If we were to roll back a few years, when our network costs were $5,000 per month and total traffic volume was only 20 GB per month, our savings would have been 1.7 GB or $250 per month, spinning out the ROI over 20 years. This is a little outside the timeframe in which most businesses work, especially considering that the Web has only been with us for about 10 years.

Perhaps more configuration changes can improve our caching statistics. A larger cache size, for example, may improve caching. In our experience, a 2-GB cache was 70% utilized and a 4-GB cache was 90% utilized. However, there was little change in observed cache hits. We have not instituted pre-fetching. Although this may enrich the user experience, it wouldn't help our network charges. We are redirecting some sites, predominately banner advertising images. However, this is only feathering at the edges, as we would require a major change to make any impact on the download volume. If we were able to triple the cache volume proportion to, for example, 50% (15 GB) or $2,000 per month in network charges, then the ROI is a little more than one year.

Thus far, we haven't made much of a saving in network costs because the download volume savings were quite small with the Web cache proxy. Nonetheless, the Web cache proxy is still a worthwhile investment. To understand how, we once again examine the logs. For every access, there is a log parameter (duration), which is the number of milliseconds between the client requesting the item and the proxy responding with the result. Naturally, cache hits are faster than the cache misses and we can calculate the average (mean) response time with and without the cache proxy (Figure 4).

From the above information, we can then multiply the access time by the total number of accesses to estimate how long users needed to wait for a Web page request response. Each Web page may constitute one or several items and each item is counted as one access. We can now compare the time users need to spend waiting for Web pages, with and without the cache, and calculate their productivity costs to the organization. However, we are assuming that users don't abandon their workstations during this time to perform some other task (Figure 5).

In our experience, the current average download time per access is 2.1 seconds without caching. The average download time per access with caching is 1.1 seconds, which is a 1.0-second average improvement over non-cached accesses. One second doesn't sound like a lot, but consider that there were 7.0 million accesses per month, which is 1,950 hours less per month that users had to wait for their downloads. This represents $46,700 per month in productivity savings (assuming $50,000 per annum per person = $24 per hour). On the other hand, the minimum productivity cost would be $51,300 per month while users wait for their Web access downloads.

One year ago the average download time per access (with caching) was 2.1 seconds, an improvement of 1.4 seconds over non-cache accesses. With 4.5 million accesses per month (or 1,750 hours per month), that translates to $42,000 in productivity savings per month. Alternatively, a minimum of $63,000 per month in productivity costs was spent waiting for Web downloads. Our link speed at that time was 512/1024 Mb/s CIR Frame Relay, and the average volume was 40 GB per month and cost $10,000 per month in usage charges.

Two years ago, the average download time per access (with caching) was 3.3 seconds, an improvement of 1.2 seconds over non-cache accesses. With 1.7 million accesses per month, or 560 hours per month, that means $13,500 in productivity savings per month. Otherwise, it cost a minimum of $37,500 per month in productivity. The link speed at that time was 128/256 Mb/s CIR Frame Relay, and the average volume was 25 Gb per month and cost $5,000 per month in usage charges.

The overall average time saving over the years was 40% over non-cache Web accesses. This provided substantial gains in user productivity regardless of the network link speed.

Additionally, the increased link speed (from 1 Mb/s to 5 Mb/s) and cost (by $6,000 per month) from March 2001, resulted in decreased Web site access time (by approximately one second), which represents a productivity saving of $40,000 per month (assume six million site accesses). Therefore, there was a net gain of $34,000 after increasing the network link speed.

Conclusion

Our analysis indicates that initial savings from a Web cache proxy might be marginal if we only examine the download or network charges. However, remember that one of the original purposes of Web cache proxies was to save time as multiple users utilized the limited available bandwidth. Although the access time may now be less with high-speed links, the use of cache proxies represented savings of around 40% in download time regardless of link speed. This value may be small for an individual or instance, but multiplied over a large number of users and periods, significant productivity gains occur.

In the previous year with the Web cache proxy, we saved potentially more than $500,000 per annum in productivity costs. In the first six months of this year, we have potentially saved $280,000. This analysis therefore demonstrates that the cost of purchase and maintenance of a Web cache proxy can generate an ROI of one month. Furthermore, the productivity savings with the faster accesses can potentially offset that costs of higher speed network links.

Resources

PNG Graphics Library (libpng) -- ftp://ftp.uu.net/graphics/png

Data Compression Library (zlib) -- http://www.cdrom.com/pub/infozip/zlib

Squid cache proxy -- http://www.squid-cache.org

Squid redirector software (jesred) -- http://ivs.cs.uni-magdeburg.de/~elkner/webtools/jesred/

Squid authentication software (msntauth) -- http://stellarx.tripod.com

Perl GD module -- http://www-genome.wi.mit.edu/~lstein/

GD Graphic Library -- http://www.boutell.com

Search CPAN.org -- http://search.cpan.org

Webalizer Web log analyzer -- http://www.mrunix.net/webalizer/

Stanley Wong has performed statistical and database programming at the University of Sydney Computing Centre and was the systems administrator for VAX/VMS and Digital Ultrix systems. He has since worked as a systems administrator with AIX, HP-UX, and Solaris. Stanley currently supports the Internet services for a financial services organization. He may be contacted at: stanley.wong@zurich.com.au.