Cache Proxies Do and Don't Save on Internet Costs
Web cache proxies are often promoted as one means to save network
download costs. In my experience, they have had only a small influence
on the network volume and, at first glance, the return on investment
(ROI) is not worthwhile. However, after further analysis, the major
contribution of Web cache proxies to the balance sheet is in productivity
costs to the organization. The total cost of ownership (TCO) of
Internet services is adversely affected in the absence of a Web
cache proxy. I've provided a sample Perl script that can be
used to analyze Web cache proxy effectiveness (as opposed to the
cache proxy hardware/software performance).
Network Traffic and Costs
The Internet is a freely available resource for information, databases,
and communication. Most medium to large commercial organizations
have connected their corporate LANs to the Internet in some way.
They usually provide Internet access for their employees, too. This
access comes at a cost and, within Australia, it is dependent on
the volume of traffic downloaded. It is in the interests of the
businesses to control this expense and one of the most popular means
to do so is via use of a Web cache proxy server. In this article,
I examine the way Web cache proxy servers do and don't reduce
the expense of Internet connection.
When admins think about Internet access and services, they think
of services like telnet, FTP, NNTP, and NTP. However, the majority
of Internet users think of the Web and email and, not surprisingly,
these two services represent the majority of traffic to and from
the Internet. For example, in our organization, typically 50% of
the inbound traffic is Web traffic while a third is email traffic.
Unfortunately, there is currently not much you can do with inbound
email traffic, but correctly configured Web cache proxy servers
are said to be able to save 30-60% of the Web traffic. In my company,
we could potentially save 15-30% of our Internet traffic.
Our organization has a high-speed link to a major ISP and we have
1500 employees, the majority of whom have access to the Internet
for Web browsing. Typically, we download 60 GB per month at a cost
between $110 and $170 per GB, and the bill from our ISP totals around
$16,000 per month (including fixed service fee and GST). If we were
to believe the figures above, it would mean that without our cache
proxy we would be spending an extra $1,500 - $4,000 per month (10-25
GB) in usage charges. This is more than enough to justify purchase
and maintenance of one Web cache proxy. Though in actuality, the
extra download could not be the case because our bandwidth would
be saturated, limiting the throughput. See Figure 1.
Web Cache Proxy
Web proxies log each access from every client and the cache proxy
will first attempt to source the Web pages from those it has cached
locally in memory or storage (disk). Otherwise, it will retrieve
the page from the original source on the Internet. When these logs
are examined, we find that approximately 45% of the accesses were
cached and, thus, only in 55% of accesses need to go to the Internet.
(See Figure 2.) However, ISPs don't charge on the number of
accesses; they charge on volume downloaded.
When the proxy logs are examined and the byte counts are collated,
we find that the 45% of cache accesses only account for 15% of the
volume. (See Figure 3.) In other words, 85% of the download volume
is not cached. Thus the saving in network usage charges with Web
cache proxies represent only 7.5% of the total (5 GB or $750 per
month, in our case). With current figures, the cost savings in having
a Web cache proxy server means the ROI is four years (assuming $22,000
server/appliance and $200 per month maintenance). If we were to
roll back a few years, when our network costs were $5,000 per month
and total traffic volume was only 20 GB per month, our savings would
have been 1.7 GB or $250 per month, spinning out the ROI over 20
years. This is a little outside the timeframe in which most businesses
work, especially considering that the Web has only been with us
for about 10 years.
Perhaps more configuration changes can improve our caching statistics.
A larger cache size, for example, may improve caching. In our experience,
a 2-GB cache was 70% utilized and a 4-GB cache was 90% utilized.
However, there was little change in observed cache hits. We have
not instituted pre-fetching. Although this may enrich the user experience,
it wouldn't help our network charges. We are redirecting some
sites, predominately banner advertising images. However, this is
only feathering at the edges, as we would require a major change
to make any impact on the download volume. If we were able to triple
the cache volume proportion to, for example, 50% (15 GB) or $2,000
per month in network charges, then the ROI is a little more than
Thus far, we haven't made much of a saving in network costs
because the download volume savings were quite small with the Web
cache proxy. Nonetheless, the Web cache proxy is still a worthwhile
investment. To understand how, we once again examine the logs. For
every access, there is a log parameter (duration), which is the
number of milliseconds between the client requesting the item and
the proxy responding with the result. Naturally, cache hits are
faster than the cache misses and we can calculate the average (mean)
response time with and without the cache proxy (Figure 4).
From the above information, we can then multiply the access time
by the total number of accesses to estimate how long users needed
to wait for a Web page request response. Each Web page may constitute
one or several items and each item is counted as one access. We
can now compare the time users need to spend waiting for Web pages,
with and without the cache, and calculate their productivity costs
to the organization. However, we are assuming that users don't
abandon their workstations during this time to perform some other
task (Figure 5).
In our experience, the current average download time per access
is 2.1 seconds without caching. The average download time per access
with caching is 1.1 seconds, which is a 1.0-second average improvement
over non-cached accesses. One second doesn't sound like a lot,
but consider that there were 7.0 million accesses per month, which
is 1,950 hours less per month that users had to wait for their downloads.
This represents $46,700 per month in productivity savings (assuming
$50,000 per annum per person = $24 per hour). On the other hand,
the minimum productivity cost would be $51,300 per month while users
wait for their Web access downloads.
One year ago the average download time per access (with caching)
was 2.1 seconds, an improvement of 1.4 seconds over non-cache accesses.
With 4.5 million accesses per month (or 1,750 hours per month),
that translates to $42,000 in productivity savings per month. Alternatively,
a minimum of $63,000 per month in productivity costs was spent waiting
for Web downloads. Our link speed at that time was 512/1024 Mb/s
CIR Frame Relay, and the average volume was 40 GB per month and
cost $10,000 per month in usage charges.
Two years ago, the average download time per access (with caching)
was 3.3 seconds, an improvement of 1.2 seconds over non-cache accesses.
With 1.7 million accesses per month, or 560 hours per month, that
means $13,500 in productivity savings per month. Otherwise, it cost
a minimum of $37,500 per month in productivity. The link speed at
that time was 128/256 Mb/s CIR Frame Relay, and the average volume
was 25 Gb per month and cost $5,000 per month in usage charges.
The overall average time saving over the years was 40% over non-cache
Web accesses. This provided substantial gains in user productivity
regardless of the network link speed.
Additionally, the increased link speed (from 1 Mb/s to 5 Mb/s)
and cost (by $6,000 per month) from March 2001, resulted in decreased
Web site access time (by approximately one second), which represents
a productivity saving of $40,000 per month (assume six million site
accesses). Therefore, there was a net gain of $34,000 after increasing
the network link speed.
Our analysis indicates that initial savings from a Web cache proxy
might be marginal if we only examine the download or network charges.
However, remember that one of the original purposes of Web cache
proxies was to save time as multiple users utilized the limited
available bandwidth. Although the access time may now be less with
high-speed links, the use of cache proxies represented savings of
around 40% in download time regardless of link speed. This value
may be small for an individual or instance, but multiplied over
a large number of users and periods, significant productivity gains
In the previous year with the Web cache proxy, we saved potentially
more than $500,000 per annum in productivity costs. In the first
six months of this year, we have potentially saved $280,000. This
analysis therefore demonstrates that the cost of purchase and maintenance
of a Web cache proxy can generate an ROI of one month. Furthermore,
the productivity savings with the faster accesses can potentially
offset that costs of higher speed network links.
PNG Graphics Library (libpng) -- ftp://ftp.uu.net/graphics/png
Data Compression Library (zlib) -- http://www.cdrom.com/pub/infozip/zlib
Squid cache proxy -- http://www.squid-cache.org
Squid redirector software (jesred) -- http://ivs.cs.uni-magdeburg.de/~elkner/webtools/jesred/
Squid authentication software (msntauth) -- http://stellarx.tripod.com
Perl GD module -- http://www-genome.wi.mit.edu/~lstein/
GD Graphic Library -- http://www.boutell.com
Search CPAN.org -- http://search.cpan.org
Webalizer Web log analyzer -- http://www.mrunix.net/webalizer/
Stanley Wong has performed statistical and database programming
at the University of Sydney Computing Centre and was the systems
administrator for VAX/VMS and Digital Ultrix systems. He has since
worked as a systems administrator with AIX, HP-UX, and Solaris.
Stanley currently supports the Internet services for a financial
services organization. He may be contacted at: email@example.com.