Web Capacity Planning: How to Plan for Server Growth
There is surprisingly little relevant literature on capacity planning, and having the latest and the fastest equipment doesn't always guarantee the best performance, nor is always economically feasible. As a result, sys admins rely on a mixture of experience and "magic" in right-sizing their systems. In this article, I provide some practical advice on selecting a system and planning for growth, as well as getting the most of your system for Web server purposes.
HTTP over TCP/IP
Web clients (browsers) and servers communicate via the Hyper Text Transfer Protocol (HTTP). A typical Web client makes a connection to a Web server, requests and receives one HTTP object (e.g., an HTML file or a JPEG image) or an error message, then closes the connection. (For an example of what actually happens, telnet to port 80 of www.samag.com and type "GET /" followed by a return.) A typical HTML page contains several references to other HTTP objects (like in-line images and sounds), so the client makes separate connections for each of the remaining objects before the page is properly displayed on the screen. (Netscape Navigator by default makes up to four connections to the server in rapid succession, so several objects may be loading "simultaneously".)
A relatively small amount of data is transferred over each such connection, and it is not uncommon for a Web server to receive millions of connections a day. Thus, current out-of-the-box TCP/IP implementations may not be the best match for the HTTP world, since they are usually optimized for a few long-lived connections with substantial amounts of data transferred. In fact, the overhead for HTTP transfers reaches 30% due to constant opening and closing of TCP/IP connections.
While you may not be able to eradicate this considerable overhead, you should be able to find tips for optimizing TCP/IP performance from the OS vendor of your choice. (Please refer to the sidebar "Solaris Network Performance Tuning Tips" for Solaris-specific recommendations.) In addition, HTTP version 1.1 (also known as "keepalive") allows a Web client to request and receive several HTTP objects over one persistent connection, all but eliminating the problem. Most newer browsers and servers support HTTP1.1, but with lots of legacy clients still in use, HTTP 1.0 is still the de facto standard.
It is helpful to calculate the typical size of an HTTP object that your server will be providing. This is also known as the "hit" size. There are several commonly used data formats, with widely varying sizes:
HTML and Plain Text Files 0-15 Kb, seldom more than 25 Kb
Graphics Files (JPEG, GIF) 1-100 Kb, sometimes up to 1 Mb
Java Programs 20-250 Kb, sometimes a few Mb
Multimedia (Audio, Video,ShockWave) 100 Kb+, sometimes up to 10 Mb
If you already have a server, you will be able to come up with a very accurate average "hit" size by examining your server's logs (the transfer size is typically logged for each request, and is the last field in the Common Log Format file). At the very least, you can determine the average "hit" size by averaging the sizes of all of your static files. This number, however, doesn't reflect files that are rarely accessed and completely disregards dynamic data (more on this later).
Besides knowing the average "hit" size, it helps to categorize the files by their sizes. For most Web servers, the majority of requests are for small files. A smaller number of requests are for larger files, and there is progressively less interest in very large files (not many are willing to download a 10 Mb video, especially over a 28.8 kbps connection). This is the basis for the SPECweb96 benchmarking software, which uses a good mix of files of different sizes to rate HTTP server performance on various platforms. Although a bit outdated, SPECweb96 performance data does give a reasonable estimation of the type of performance to expect from a particular combination of hardware/OS and Web server software, and how that applies to your particular mix of files. This information can be used as the starting point for selecting a server. Again, be advised that most systems may not be optimized for the Web out of the box, so contact your vendor for specific fine-tuning information.
Bandwidth may be the determining factor in sizing your server. Clearly, there is no reason to get a server capable of millions of transactions per day when your network connection can only support a few thousand. For instance, if your average hit-size is 10 K, and there is no other competing traffic on the network, you will be able to support a maximum of about 7,700,000 hits per day on a 10baseT LAN, and roughly 960,000 on a dedicated T1.
In reality, there is usually plenty of other traffic on the network, so you will have to account for congestion in your calculations. Additionally, there are most likely peak periods during the day, when several times the typical number of connections is experienced. Thus, the actual number of connections that the server is able to consistently sustain during the day will be reduced by a factor of 3 to 4.
There are two basic types of servers in use today - forking and threaded. A typical forking server forks (creates) a process to handle each request. Just like out-of-the-box TCP/IP implementations, most UNIX systems are optimized for a few long-lived processes. On a busy Web site, the overhead of forking a process for each request can be tremendous. (On earlier Solaris systems, twice the CPU time was spent creating and destroying server processes than on processing the actual request.)
A few servers attempt to solve this problem by pre-forking several server processes. However, if more requests arrive than there are running servers, more processes need to be forked. Threaded servers are much more efficient, because they allow a single server process to manage multiple connections. Both the CPU and memory requirements are reduced, and the number of hits that this type of server can support increases. As a result, threaded servers combined with HTTP 1.1 offer several times the performance of older forking servers supporting HTTP 1.0.
A few years ago, most of the information on the Web was static, however, the paradigm has now shifted toward dynamic content. Unlike static pages, which required almost no processing from the Web server, dynamic pages put a considerable burden on it.
One way to create dynamic content is through CGI (Common Gateway Interface) programs. This model has become increasingly popular due to its simplicity of implementation and portability. When a Web server receives a request for a CGI program, it forks a process and runs the requested program. It sends the browser's input to the program and redirects its output directly to the browser.
The CGI model is extremely flexible in allowing any language to be used for CGI programs. The one most widely used on the Web right now is Perl. Although Perl is extremely efficient and very well suited for the Web (among other things), it is not be the best choice for frequently accessed CGI scripts. Perl is an interpreted language, and there is considerable overhead associated with interpreting the same program each time it's accessed. A compiled program can often run 10-20 times faster, because it makes fewer run-time checks. Thus, a language like C would be a better fit for a heavily accessed program. (Note that Perl5.005 will include a compiler.)
Considerable overhead also is associated with forking a process for each CGI program. One way around this is to modify the Web server to do the dynamic processing, that is to write programs for each particular server's API (Application Program Interface). Server-side includes (i.e., a way to embed instructions for the server in HTML code) are implemented in this way. The disadvantage is that the programs are generally not portable from one type of server to another. Also, each modified server's process will require more memory. However, the speed with which you are able to service your requests, and thus the overall amount of requests that you can handle, will increase.
An even better way to implement CGI functionality is through the emerging FastCGI standard. While maintaining the basic structure of the CGI model, FastCGI creates persistent processes for FastCGI programs, and they are reused to handle multiple requests. This method solves the CGI performance problem of creating a new process for each request. FastCGI programs also are portable and isolated from the server; thus, they are less likely to cause server problems than API programs.
Besides CGI and server-side includes, a number of things can compete for the server's limited resources. SSL (Secure Sockets Layer), which encrypts the traffic between the server and the client, is known to impose up to a 30% overhead. More obvious and widely used culprits are search engines and database back-ends. The configuration of search engines and database gateways is beyond the scope of this article, but should be very seriously considered when sizing a Web server.
Web servers with mostly static pages are usually not very demanding on the CPU or the storage subsystem, even with several million hits per day. Without the additional load of search engines and other server-side processing, the CPU utilization is usually well under 10%. Additional memory will allow for even better performance via caching of frequently used pages. Configure memory according to OS and hardware vendor recommendations plus at least 2 MB per Web server processes (please check for actual requirements of your particular server of choice). 128 Mb will be sufficient for most "static" servers; 256 Mb or more will be required for a server with dynamic content and heavily used search engines.
In the static scenario, disk will hardly be the bottleneck, especially if there is enough memory for effective OS-level I/O caching. For instance, the AltaVista search engine is known to serve most of the frequently requested information from its cache. However, for a large, heavily accessed server, when it's impossible to put all of the most popular pages into cache, consider putting system and Web files on a separate disk or spreading the Web files across several disks.
1+0 RAID (for instance, with DiskSuite on Solaris) offers both the performance gain of striping across multiple disks and safety of mirroring with negligible overhead, while controller-based RAID 5 may be a more economical solution. In either case, try to make the stripe size a multiple of the average "hit" size; this will allow you to balance the load between disks efficiently.
Bandwidth - not hardware - is often the determining factor in actual server performance, especially in Internet installations. A dedicated T1 line can realistically support about a million 10 Kb hits a day, which translates to a measly 11 hits/sec. Corporate Internet connections are often also used for incoming and outgoing mail, Web surfing, etc., decreasing the actual number of possible hits even further. On the intranet, where speeds are usually 10/100 Mbs, bandwidth is rarely an issue, especially considering the decrease in potential audience size.
A 133 MHz Pentium or a Sparcstation 5 with 64 Mb RAM and Solaris 2.5.1-SISS is capable of handling 8.2 million hits per day. An Ultra-1/170 or a Pentium Pro machine with 128 Mb will practically handle 26.8 million hits, provided a dedicated network. For other platforms, the SPECweb96 results can serve as useful guidelines for sizing "static" servers.
Dynamic content (and other server-side processing) place an entirely different load on the system. SPECweb97 is going to take into account dynamic content and HTTP 1.1. Meanwhile, the best strategy is to size each additional component separately, then add the requirements to those of a static scenario. Components that need to be taken into account are back-end database systems, CGI scripts, search engines, and API programs (like php and perlscript).
Selecting Server OS/Software
The best server combination would be a threaded HTTP 1.1 server (such as Netscape or Apache) running a late UNIX release (such as Solaris 2.6, for which the TCP/IP stack has been optimized for HTTP). In general, NT networking (among other things) does not yet scale or perform as well as that of UNIX, so if you anticipate several million hits per day or more, I would recommend a flavor of UNIX.
If you are going to use CGI programs, I recommend writing them in a language you can compile (like C) and using FastCGI. You could also write for the server's API, but the former is the preferred method.
Good Web design goes a long way in increasing the server's performance. For instance, try to make your images smaller and consolidate groups of them into image maps. With an older HTTP 1.0 browser, five connections will be made for a page with four images; whereas only two connections will be made if the images are combined into one. Web design is outside the scope of this article, but should not be overlooked.
Selecting the Client Software
Planning for Growth
The Internet is known for sudden bursts of traffic, with tremendous increases in the number of hits overnight (this is less of a phenomenon on the intranet). An initial investment in a multi-processor capable server and OS will pay off in such a situation, since it will allow for a quick increase in raw processing power. In many cases, it is more economical to purchase several smaller servers as they are needed. These can be used to set up round-robin DNS, which will cycle through IP addresses of servers, thus distributing the load between them.
This method does not take into account the fact that the load on one server may already be higher than the other. A variation on round robin is offered by, among others, Firewall-1, whereby you can set up a logical group of servers to be cycled through based on load and direct traffic to the less utilized one. Round robin variations are used by several large sites including home.netscape.com and bigyellow.com. This setup is often more effective than purchasing one extremely large machine, especially since system performance scales linearly only up to a point.
Obtaining top performance is not synonymous to having the fastest hardware available. Tremendous improvements in performance can be gained by selecting the right combination of Web server and browser software, along with careful optimization of server-side processing and tuning of TCP/IP parameters. In this regard, tuning a machine for Web service is similar to tuning for other types of applications. You must understand the application requirments and tune accordingly. Keeping records of such activities and the resulting system capabilities will help provide the information needed for future capacity planning.
Cockcroft, Adrian. 1994. Sun Performance and Tuning: Sparc & Solaris. Prentice Hall.
Open Market, Inc. FastCGI: A High-Performance Web Server Interface. (http://www.fastcgi.com/kit/doc/ \
O'Reilly, Tim, and Smith, Ben. The Importance of Perl. (http://perl. \
Ousterhout, John K. Scripting: Higher Level Programming for the 21st Century. (http://www.sunlabs.com/~ouster/ \
Stevens, W. Richard. 1994. TCP/IP Illustrated Volume I. Addison-Wesley Publishing Company, Inc.
The System Performance Evaluation Cooperative (SPEC). SPECweb96. (http://www.spec.org/osg/web96/).
Wong, Brian L. 1997. Configuration and Capacity Planning for Solaris (TM) Servers. Prentice Hall.
About the Author
Dave D. Zwieback is the Technical Director of InkCom (www.inkcom.com), a New Jersey-based consultancy specializing is Solaris/SunOS administration and Internet/Intranet development. He is also the Sr. sys admin at Timeplex Group, in Woodcliff Lake, NJ. He can be reached at: email@example.com.