Ah, performance, that elusive objective that we all strive for within our individual systems and our system architectures. In concert with its lofty position as an ideal, performance is much like beauty - difficult to describe in precise terms, but immediately recognizable when we see it. Just as writers and artists have struggled through the ages to define beauty in their works, performance experts have long attempted to arrive at the perfect benchmark.
Early efforts defining performance in terms of millions of instructions per second (MIPS) quickly became passe as systems, even PCs, churned out more MIPS than were meaningful to measure. Since that time, various benchmark authors have taken aim at the elusive target. Numerous tongue-in-cheek benchmark names have come into and gone out of favor over the years. Dhrystones and Whetstones, while popular for several years, were quickly optimized out of significance by compiler wizards. I would not be surprised by a Java-oriented benchmark to come out with a name, derived in the same tradition, along the lines of javaperks or beansprouts.
Three factors control the usefulness of a benchmark. First, technical relevance - the benchmark must accurately measure one or more significant elements of a system's performance. Next, accessibility - the benchmark source code must be widely available, either free or at a nominal license fee. Finally, industry relevance - the benchmark must be supported by a majority of vendors so that product specifications include ratings for that benchmark. Currently, there are only a handful of benchmarks that satisfy all three criteria. The SPEC suites of integer and floating point benchmarks, published by the System Performance Evaluation Committee, do a good job of reflecting CPU performance, and are widely supported within the industry. That results in SPEC benchmark data being available on the SPEC Web site (http://specbench.org) for most workstations and servers soon after their introduction. The license fees for the suites, however, are on the stiff side, making it difficult for non-commercial users to take advantage of the measures. Even more out of touch with the common person are the various TPC benchmarks from the Transaction Processing Council. While almost exclusively used as the measure of server performance for transaction processing, running the benchmark is far beyond the capability of most labs, even if the lab could obtain the code. People who run benchmarks regularly, however, recognize that CPU benchmarking does not always accurately reflect the performance of a system in real user environments. All too often a system that appears mediocre in CPU benchmarks turns out to be a stellar performer when running the applications for which it was designed. Thus, CPU performance figures must be put in the appropriate context when evaluating systems. Application performance may prove to be a more useful measure.
So far, however, no single application-oriented benchmark has surfaced that has broad industry support. The OpenGL benchmark, viewperf, comes close for measuring graphic performance, but doesn't speak to general application performance. The SPEC association is attempting to solve that problem. Under the sponsorship of SPEC, work is underway to create an application benchmark that will be endorsed by the SPEC membership. Once completed, that work should provide us with another, but more meaningful, metric for system performance comparisons. I'll be interested to see if the system vendors can get beyond their marketing self-interest to support the result of SPEC's efforts.