![]() |
comp.benchmarks FAQ |
![]() Wiscinfo |
![]() CS Home |
![]() CSL |
comp.benchmarks Frequently Asked Questions, With Answers Version 1.0, Sat Mar 16 12:12:48 1996 Copyright 1993-96 Dave SillNot-for-profit redistribution permitted provided this notice is included. NOTE: Many of the answers to these questions were derived from articles posted to comp.benchmarks throughout the years. I have generally made no effort to attribute them to their sources because I didn't have time to get everyone's approval--or even the time to include the attributions :-). If you recognize something you wrote, and you'd like it attributed, let me know.] CONTENTS SECTION 1 - General Q/A 1.1. What is comp.benchmarks? 1.2. What is a benchmark? 1.3. How are benchmarks used? 1.4. What kinds of performance do benchmarks measure? 1.5. What are all these strange messages from Eugene Miya? 1.6. What are the most common benchmarks? 1.7. Where can I get benchmark results and source code? 1.8. How can one safely interpret benchmark results? 1.9. What are the pitfalls involved with running benchmarks? SECTION 2 - Common Benchmarks 2.1. 007 (ODBMS) 2.2. AIM 2.3. Dhrystone 2.4. Khornerstone 2.5. LFK (Livermore Loops) 2.6. LINPACK 2.7. MUSBUS 2.8. NAS Kernels 2.9. Nhfsstone 2.10. PERFECT 2.11. RhosettaStone 2.12. SLALOM 2.13. SPEC 2.14. SSBA 2.15. Sieve of Eratosthenes 2.16. TPC 2.17. WPI Benchmark Suite 2.18. Whetstone 2.19. Xstone 2.20. bc 2.21. SYSmark 2.22. Stanford 2.23. Bonnie 2.24. IOBENCH 2.25. IOZONE 2.26. Byte 2.27. Netperf 2.28. Nettest 2.29. ttcp 2.30. CPU2 2.31. Hartstone 2.32. EuroBen 2.33. PC Bench/WinBench/NetBench 2.34. Sim 2.35. Fhourstones 2.36. Heapsort 2.37. Hanoi 2.38. Flops 2.39. C LINPACK 2.40. TFFTDP 2.41. Matrix Multiply (MM) 2.42. Digital Review 2.43. Nullstone 2.44. Rendermark 2.45. Bench++ 2.46. Stream SECTION 3 - Terminology 3.1. Configuration 3.2. MFLOPS 3.3. MIPS 3.4. Representative 3.5. Single Figure of Merit 3.6. KAP SECTION 4 - Other Sources of Information 4.1. WORLD WIDE WEB 4.2. FTP 4.3. COMMERCIAL BENCHMARKS 4.4. PUBLICATIONS 4.5. OTHER NETWORK SERVICES SECTION 1 - General Q/A 1.1. What is comp.benchmarks? Comp.benchmarks is a USENET newsgroup for discussing computer benchmarks and publishing benchmark results and source code. If it's about benchmarks, this is the place to post or crosspost it. 1.2. What is a benchmark? A benchmark is test that measures the performance of a system or subsystem on a well-defined task or set of tasks. 1.3. How are benchmarks used? Benchmarks are commonly used to predict the performance of an unknown system on a known, or at least well-defined, task or workload. Benchmarks can also be used as monitoring and diagnostic tools. By running a benchmark and comparing the results against a known configuration, one can potentially pinpoint the cause of poor performance. Similarly, a developer can run a benchmark after making a change that might impact performance to determine the extent of the impact. Benchmarks are frequently used to ensure the minimum level of performance in a procurement specification. Rarely is performance the most important factor in a purchase, though. One must never forget that it's more important to be able to do the job correctly than it is to get the wrong answer in half the time. 1.4. What kinds of performance do benchmarks measure? Benchmarks are often used to measure general things like graphics, I/O, compute (integer and floating point), etc., performance, but most measure more specific tasks like rendering polygons, reading and writing files, or performing operations on matrixes. Any aspect of computer performance that matters to the user can be benchmarked. 1.5. What are all these strange messages from Eugene Miya? They are automatically-posted sections of a multi-part Frequently Asked Questions (FAQ) for comp.benchmarks. The subject headers look like: Subject: [l/m 3/17/92] WPI Benchmark Suite (1.1) (19/28) c.be.FAQ which means: This is part 19 of the 28-part comp.benchmarks FAQ. The topic of this section is "WPI Benchmark Suite (1.1)", it was last modified on March 17, 1992, and it is automatically posted on the 17th day of each month. The body of these articles starts with an index of the sections (panels) of the multipart FAQ, with the current section rotated to the top. For example: 19 WPI Benchmark 20 Equivalence 21 TPC 22 23 24 25 Ridiculously short benchmarks 26 Other miscellaneous benchmarks 27 28 References 1 Introduction to the FAQ chain and netiquette 2 Benchmarking concepts 3 PERFECT Club/Suite 4 5 Performance Metrics 6 7 Music to benchmark by 8 Benchmark types 9 Linpack 10 11 NIST source and .orgs 12 Benchmark Environments 13 SLALOM 14 15 12 Ways to Fool the Masses with Benchmarks 16 SPEC 17 Benchmark invalidation methods 18 Notice that there are some unused sections (4, 6, 10, 14, 18, 22-24, 27). Some people find these FAQ's confusing and configure their newsreader to automatically "kill" them (mark them as read). 1.6. What are the most common benchmarks? See Section 2 - Common Benchmarks 1.7. Where can I get benchmark results and source code? See Section 2 - Common Benchmarks 1.8. How can one safely interpret benchmark results? There are many dangers involved in correctly understanding and interpreting benchmark results, whether they come from your own locally generated test or are supplied by the vendor of a commercial system. Here are some things to take into account: 1) Which benchmark was run? You'll need the name, version, and details on any changes made, whether for portability or to improve performance. 2) Exactly what configuration was the benchmark run on? a) processor model, speed, cache, number of CPU's b) memory c) software versions (operating system, compilers, relevant applications, etc.) d) compiler/loader options and flags used build executables e) state of the system (single user, multiuser, active, inactive, etc.) f) peripherals, e.g., hard disk drives 3) How does performance on the benchmark relate to my workload? This is really the key question. Without knowing what a benchmark measures, one can't begin to determine whether systems that perform better on it will perform better on their own workload. 1.9. What are the pitfalls involved with running benchmarks? They pretty much mirror the difficulties in interpretation of results. First, you need to document what you're running. If it's a well-known program, record the name, version number, and any changes you've made. It's a good idea to use some kind of version control system (e.g., RCS or SCCS on UNIX) to keep track of changes and make it possible to backtrack to previous versions. If it's a locally-written benchmark, it's especially important to use a version control system. Second, record as much information about the system configuration as you can. Sometimes the most seemingly insignificant thing can have a profound effect on the performance of a system, and it's very hard to repeat results without being able to recreate the original configuration. SECTION 2 - Common Benchmarks 2.1. 007 (ODBMS) Description: Designed to simulate a CAD/CAM environment. Tests: - pointer traversals over cached data, disk-resident data, sparse traversals, and dense traversals - updates: indexed and unindexed object fields, repeated updates, sparse updates, updates od cached data and creation and deletion of objects - queries: exact-match lookup, ranges, collection scan, path-join, ad-hoc join, and single-level make Originator: University of Wisconsin Versions: unknown Availability of Source: free from ftp.cs.wisc.edu:/007 Availability of Results: free from ftp.cs.wisc.edu:/007 Entry Last Updated: Thu Apr 15 15:08:07 1993 2.2. AIM - 1989 - Aim Technology, Palo Alto - C - 2 suites (suite III and V) Suite III: simulation of applications (task- or device-specific) - Task-specific routines (word processing, database management, accounting) - Device-specific routines (memory, disk, MFlop/s, IO/s - all measurement represent a percentage of VAX 11/780 performance (100%) - second information is user support (maximum concurrent users) - VAX 11/780 == 12 users In general, the AIM Suite III gives an overall performance indication. Suite V: measures throughput in a multitasking workstation environment - incremental system loading - testing multiple aspects of system performance The graphically displayed results plot the workload level versus time. Several different models characterize various user environments (financial, publishing, sw engeneering). The published reports are copyrighted. See: Walter J. Price, A benchmark Tutorial, IEEE Micro, Oct. 1989 (28-43) 2.3. Dhrystone Description: Short synthetic benchmark program intended to be representative for system (integer) programming. Based on published statistics on use of programming language features; see original publication in CACM 27,10 (Oct. 1984), 1013-1030. Originally published in Ada, now mostly used in C. Version 2 (in C) published in SIGPLAN Notices 23,8 (Aug. 1988), 49-62, together with measurement rules. Version 1 is no longer recommended since state-of-the-art compilers can eliminate too much "dead code" from the benchmark (However, quoted MIPS numbers are often based on version 1). Problems: Due to its small size (100 HLL statements, 1-1.5 KB code), the memory system outside the cache is not tested; compilers can too easily optimize for Dhrystone; string operations are somewhat overrepresented. Recommendation: Use it for controlled experiments only; don't blindly trust single Dhrystone MIPS numbers quoted somewhere (Don't do this for any benchmark). Originator: Reinhold Weicker, Siemens Nixdorf (weicker.muc@sni.de) Versions in C: 1.0, 1.1, 2.0, 2.1 (final version, minor corrections compared with 2.0) See also: R.P.Weicker, A Detailed Look ... (see Publications, 4.3) Availability of source: netlib@ornl.gov, ftp.nosc.mil:pub/aburto Availability of results (no guarantee of correctness): Same as above Entry last updated: Dec. 30, 1993, Reinhold Weicker 2.4. Khornerstone Description: Multipurpose benchmark used in various periodicals. Originator: Workstation Labs Versions: unknown Availability of Source: not free Availability of Results: UNIX Review Entry Last Updated: Thu Apr 15 15:22:10 1993 2.5. LFK (Livermore Loops) netlib.att.com:/netlib/benchmark/livermore* 2.6. LINPACK Description: Kernel benchmark developed from the "LINPACK" package of linear algebra routines. Originally written and commonly used in Fortran; a C version also exists. Almost all of the benchmark's time is spent in a subroutine ("saxpy" in the single-precision version, "daxpy" in the double-precision version) doing the inner loop for frequent matrix operations: y(i) = y(i) + a * x(i) The standard version operates on 100x100 matrices; there are also versions for sizes 300x300 and 1000x1000, with different optimization rules. Problems: Code is representative only for this type of computation. Linpack is easily vectorizable on most systems. Originator: Jack Dongarra, Comp. Sci. Dept., Univ. of Tennessee, dongarra@cs.utk.edu See also: R.P.Weicker, A Detailed Look ... (see Publications, 4.3) Entry last updated: Dec. 30, 1993, Reinhold Weicker netlib@ornl.gov: source, results netlib.att.com:/netlib/benchmark/linpack*: source see also: C LINPACK 2.7. MUSBUS monu1.cc.monash.edu.au:/pub/musbus.sh 2.8. NAS Kernels The sequential C versions of the NAS CFD (computational fluid dynamics) benchmarks - appbt and appsp - are now available from the anonymous ftp site ftp.cs.wisc.edu:/wwt/Misc/NAS. This distribution contains the C version of the fortran cfd codes written at NASA Ames Research Center by Sisira Weeratunga. These codes were converted to C by a team of students for their class project for CS 838-3/ChE 562 offered in Spring 1993 by Mark D. Hill, Sangtae Kim and Mary Vernon at the University of Wisconsin at Madison. CS 838-3/ChE 562 was an experimental course that brought computer scientists and computation scientists together to promote interdisciplinary research. You should have a NAS license for the original fortran code. NASA has given us permission to distribute our ``significant'' changes freely. You can obtain the sequential fortran codes and a license by writing to: ATTN: NAS Parallel Benchmark Codes NAS Systems Division Mail Stop 2588 NASA Ames Research Center Moffett Field CA 94035 THIS SOFTWARE IS PROVIDED "AS IS". WE MAKE NO WARRANTIES ABOUT ITS CORRECTNESS OR PERFORMANCE. Douglas C. Burger (dburger@cs.wisc.edu) Shubhendu S. Mukherjee (shubu@cs.wisc.edu) Computer Sciences Department University of Wisconsin at Madison 2.9. Nhfsstone Benchmark intended to measure the performance of file servers that follow the NFS protocol. The work in this area continued within the LADDIS group and finally within SPEC. The SPEC benchmark 097.LADDIS (SFS benchmark suite, see separate FAQ file on SPEC) is intended to replace Nhfsstone, it is superior to Nhfsstone in several aspects (multi-client capability, less client sensitivity). --Reinhold Weicker 2.10. PERFECT See Miya panel #3 2.11. RhosettaStone See Miya panel #26 eos.arc.nasa.gov 2.12. SLALOM Miya panel #13 tantalus.al.iastate.edu:/pub/Slalom/ [129.186.200.15] 2.13. SPEC SPEC stands for "Standard Performance Evaluation Corporation", a non-profit organization with the goal to "establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers" (from SPEC's bylaws). The SPEC benchmarks and more information can be obtained from SPEC [Standard Performance Evaluation Corporation] 10754 Ambassador Drive, Suite 201 Manassas, VA 22110, USA USA Phone: +1-703-331-0180 Fax: +1-703-331-0181 E-Mail: spec-ncga@cup.portal.com The current SPEC benchmark suites are CINT95 CPU intensive integer benchmarks ) together: CFP95 CPU intensive floating point benchmarks ) SPEC95 SDM UNIX Software Development Workloads SFS System level file server (NFS) workload The old CPU benchmark suites CINT92 (CPU intensive integer benchmarks) ) together: CFP92 (CPU intensive floating point benchmarks) ) SPEC92 will cease to be supported by SPEC in 1996. See separate FAQ file on SPEC benchmarks. --Reinhold Weicker The following data files are available from ftp.nosc.mil/pub/aburto: specin89.tbl specfp89.tbl speccorr.tbl specin92.tbl specft92.tbl --Alfred Aburto 2.14. SSBA The SSBA is the result of the studies of the AFUU (French Association of Unix Users) Benchmark Working Group. This group, consisting of some 30 active members of varied origins (universities, public and private research, manufacturers, end users), has assigned itself the goal of thinking on the problem of assessing the performance of data processing systems, collecting a maximum number of tests available throughout the world, dissecting the codes and results, discussing the utility, fixing versions and supplying them in the form of a magnetic tape with various comments and procedures. This tape is therefore both a simple and coherent tool for the end users and also for the specialists, providing a clear and pertinent initial approximation of the performance, and could also become a "standard" in the Unix (R) world. In this way the SSBA (Synthetic Suite of Benchmarks from the AFUU) originated and here you find release 1.21E. athene.uni-paderborn.de:/doc/magazin/ix/tools/ssba1.22.tar ftp.germany.eu.net:/pub/sysadmin/benchmark/ssba/ssba.shar.Z grasp1.univ-lyon1.fr:/pub/nfs-mounted/ftp.univ-lyon1.fr/mirrors/unix/ssba/ssba-1.22English.tar.gz grasp1.univ-lyon1.fr:/pub/nfs-mounted/ftp.univ-lyon1.fr/mirrors/unix/ssba/ssba-1.22French.tar.gz grasp1.univ-lyon1.fr:/pub/nfs-mounted/ftp.univ-lyon1.fr/mirrors/unix/ssba/ssba-2.0F.tar.gz grasp1.univ-lyon1.fr:/pub/nfs-mounted/ftp.univ-lyon1.fr/mirrors/unix/ssba/ssba-synthesis.tar.gz ftp.inria.fr:/system/benchmark/SSBA/ssba1.22E.tar.Z ftp.inria.fr:/system/benchmark/SSBA/ssba1.22F.tar.Z ftp.inria.fr:/system/benchmark/SSBA/ssba-syntheses.tar.Z 2.15. Sieve of Eratosthenes An integer program that generates prime humbers using a method known as the Sieve of Eratosthenes. otis.stanford.edu:/pub/benchmarks/c/small/sieve.c ftp.nosc.mil:pub/aburto/nsieve.c ftp.nosc.mil:pub/aburto/nsieve.tbl sunic.sunet.se:/SRC/sec8/bench-dry/sieve.c 2.16. TPC TPC-A is a standardization of the Debit/Credit benchmark which was first published in DATAMATION in 1985. It is based on a single, simple, update-intensive transaction which performs three updates and one insert across four tables. Transactions originate from terminals, with a requirement of 100 bytes in and 200 bytes out. There is a fixed scaling between tps rate, terminals, and database size. TPC-A requires an external RTE (remote terminal emulator) to drive the SUT (system under test). TPC-B uses the same transaction profile and database schema as TPC-A, but eliminates the terminals and reduces the amount of disk capacity which must be priced with the system. TPC-B is significantly easier to run because an RTE is not required. TPC-C is completely unrelated to either TPC-A or B. TPC-C tries to model a moderate to complex OLTP system. The benchmark is conceptually based on an order entry system. The database consists of nine tables which contain information on customers, warehouses, districts, orders, items, and stock. The system performs five kinds of transactions: entering a new order, delivering orders, posting customer payments, retrieving a customer's most recent order, and monitoring the inventory level of recently ordered items. Transactions are submitted from terminals providing a full screen user interface. (The spec defines the exact layout for each transaction.) TPC-C was specifically designed to address many of the shortcomings of TPC-A. I believe it does this in many areas. It exercises a much broader cross-section of database functionality than TPC-A. Also, the implementation rules are much stricter in critical areas such as database transparency and transaction isolation. Overall, TPC-C results will be a much better indicator of RDBMS and OLTP system performance than previous TPC benchmarks. - 1988 - Transaction Processing-performance Council, San Jose - non-profit corporation of 44 sw- and hw-companies to define transaction processing and database benchmarks - Cobol - Contents (Basic system OLTP, Business appl. services, Databases, Complex data and Real time appl.) - 4 Benchmark-components (Data-, Database-, System and Query-model) - 4 Suites: TPC-A, ..., TPC-D Suite TPC-A: On-line transaction processing of a database env. - measures performance in update-intensive database env. (OLTP) - result in transactions/sec - 2 metrivcs (local and wide area networks) Suite TPC-B: Database benchmark - Database throughput (transactions/sec) - no OLTP - Suite TPC-C: Order-Entry benchmark (OLTP env.) - Business application services (Order-entry, Inventory Control, customer support, accounting) Suite TPC-D: Decision-Support benchmark (OLTP env.) - Database stress test, simulation of a large database, complex queries See: Miya panel #21 Shanley Public Relations, Complete TPC Results, Performance Evaluation Review, Vol. 19 #2, Aug. 91 (14-23) Shanley Public Relations, Complete TPC Results, Performance Evaluation Review, Vol. 19 #3, Feb. 93 (32-35) DG.COM 2.17. WPI Benchmark Suite See Miya panel #19 wpi.wpi.edu 2.18. Whetstone Description: The first major synthetic benchmark program, intended to be representative for numerical (floating-point intensive) programming. Based on statistics gathered at National Physical Lab in England, using an Algol 60 compiler which translated Algol into instructions for the imaginary Whetstone machine. The compilation system was named after the small town Whetstone outside the City of Leicester, England, where it was designed. Problems: Due to the small size of its modules, the memory system outside the cache is not tested; compilers can too easily optimize for Whetstone; mathematical library functions are overrepresented. Originator: Brian Wichmann, NPL (baw@seg.npl.co.uk) [E-mail address as of fall 1990, not re-verified. Reinhold] Original publication: H.J.Curnow and B.A.Wichmann: A Synthetic Benchmark. The Computer Journal 19,1 (1976), 43-49 See also: R.P.Weicker, A Detailed Look ... (see Publications, 4.3) --Reinhold Weicker cnam.cnam.fr:/pub/Ada/Repository/benchmarks/benwhet.com.Z draci.cs.uow.edu.au:/netlib/benchmark/whetstone* ftp.germany.eu.net:/pub/sysadmin/benchmark/whetston/whetstone.tar.Z lth.se:/pub/benchmark/whetstone.tar.gz netlib.att.com:/netlib/benchmark/whetstone* 2.19. Xstone netcom.com:/pub/micromed/uploads/xstones.summary.z pith.uoregon.edu:/pub/src/X11/xbench/scripts/xstones.awk alf.uib.no:/pub/Linux/BETA/X_S3/801.xstones 2.20. bc 2.21. SYSmark July 28, 1993, Santa Clara, Calif.--The Business Applications Performance Corp. (BAPCo) announces SYSmark93 for Windows and SYSmark93 for DOS, benchmark software that provides objective performance measurement based on the world's most popular PC applications and operating systems. A third program, SYSmark93 for Servers, is due for release by BAPCo in the third quarter of 1993. SYSmark93 provides benchmarks that can be used to objectively measure performance of IBM PC-compatible hardware for the tasks users perform on a regular basis. The benchmarks are comparative tools for those who make purchasing decisions for anywhere from 10 to a thousand or more PCs. SYSmark93 has been endorsed by the BAPCo membership, which includes the world's leading PC hardware and software vendors, chip manufacturers, and industry publications. SYSmark93 benchmarks represent the workloads of popular programs in such applications as word processing, spreadsheets, database, desktop graphics and software development. Benchmarking can be conducted on the user's own system or at a vendor's site using the standards set by BAPCo to ensure consistency of the results. SYSmark93 for Windows is for those interested in evaluating systems-level performance of PCs running Microsoft Windows applications. The program features a new Windows based workload manager, scripts for 10 Windows applications, automation tools, and a disclosure report generator. SYSmark93 for DOS, an upgrade to SYSmark92, is aimed at those interested in evaluating systems-level performance of PCs running DOS applications only. Both programs can generate performance metrics in three ways: as a composite of all the different applications; for a specific category of applications, such as word processing or spreadsheets; or for individual software programs. Workloads based on the following applications are included in SYSmark93 for Windows and SYSmark93 for DOS: SYSmark93 for Windows WORD PROCESSING Word for Windows 2.0b WordPerfect for Windows 5.2 AmiPro 3.0 SPREADSHEETS Excel 4.0 Lotus 1-2-3 for Windows 4.0 DATABASE Paradox for Windows 1.0 DESKTOP GRAPHICS CorelDraw 3.0 DESKTOP PRESENTATION Freelance Graphics for Windows 2.0 PowerPoint 3.0 DESKTOP PUBLISHING PageMaker 5.0 SYSmark93 for DOS WORD PROCESSING WordPerfect 5.1 SPREADSHEETS Lotus 1-2-3 3.4 QuattroPro 4.0 DATABASE Paradox 4.0 dBASE IV 1.5 DESKTOP GRAPHICS Harvard Graphics 3.0 SOFTWARE DEVELOPMENT Borland C++ 3.1 Microsoft C 6.00 SYSmark93 for Windows and SYSmark93 for DOS will be available in August from BAPCo for $390 each. Licensed SYSmark92 users will be able to upgrade to either SYSmark93 for Windows or SYSmark93 for DOS for $99 each. A non-profit corporation, BAPCo's charter is to develop and distribute a set of objective performance benchmarks based on popular computer applications and industry standard operating systems. Current BAPCo members include Adaptec, Advanced Micro Devices, AER Energy Resources, Apricot Computers, Chips and Technologies, Compaq, Cyrix, Dell, Digital Equipment Corp., Gateway2000, Epson, Hewlett-Packard, IBM, Infoworld, Intel, Lotus, Microsoft, NCR, Unisys and Ziff-Davis Labs. FOR MORE INFORMATION: John Peterson BAPCo Phone: 408-988-7654 email: John_E_Peterson@ccm.hf.intel.com Bob Cramblitt Cramblitt & Company Phone: 919-481-4599 Fax: 919-481-4639 2.22. Stanford - 1988 - Stanford University (J. Hennessy, P. Nye) - C - comparison of RISC and CISC - contains the 2 modules Stanford Integer and Stanford Floating Point Stanford Integer: - 8 little applications (integer matrix mult., sorting alg. (quick, bubble, tree), permutation, hanoi, 8 queens, puzzle) Stanford Floating Point: - 2 little applications (FFT, matrix mult.) The characteristics of the programs vary, but most of them have array accesses. There seems to be no official publication (only a printing in a performance report). Secondly, there is no defined weighting of the results (Sun and MIPS compute the geometric mean). Survey of Benchmarks, E. R. Brocklehurst. A Benchmark Tutorial, W. Price. "A detailed look at some popular benchmarks", R. P. Weicker. 2.23. Bonnie This is a file system benchmark that attempts to study bottlenecks - it is named 'Bonnie' for semi-obvious reasons. Specifically, these are the types of filesystem activity that have been observed to be bottlenecks in I/O-intensive applications, in particular the text database work done in connection with the New Oxford English Dictionary Project at the University of Waterloo. It performs a series of tests on a file of known size. By default, that size is 100 Mb (but that's not enough - see below). For each test, Bonnie reports the bytes processed per elapsed second, per CPU second, and the percent CPU usage (user and system). In each case, an attempt is made to keep optimizers from noticing it's all bogus. The idea is to make sure that these are real transfers to/from user space to the physical disk. Written by: Tim Bray 2.24. IOBENCH IOBENCHP is a multi-stream benchmark that uses a controlling process (iobench) to start, coordinate, and measure a number of "user" processes (iouser); the Makefile parameters used for the SPEC version of IOBENCHP cause ioserver to be built as a "do nothing" process. Written by: Barry Wolman (barry@s66.prime.com) [probably doesn't work] Prime Computer 500 Old Connecticut Path Framingham, MA 01701 508/620-2800, ext. 1100 (voice) 508/879-8674 (FAX) 2.25. IOZONE This test writes a X MEGABYTE sequential file in Y byte chunks, then rewinds it and reads it back. [The size of the file should be big enough to factor out the effect of any disk cache.]. Finally, IOZONE deletes the temporary file The file is written (filling any cache buffers), and then read. If the cache is >= X MB, then most if not all the reads will be satisfied from the cache. However, if it is less than or equal to .5X MB, then NONE of the reads will be satisfied from the cache. This is becase after the file is written, a .5X MB cache will contain the upper .5 MB of the test file, but we will start reading from the beginning of the file (data which is no longer in the cache) In order for this to be a fair test, the length of the test file must be AT LEAST 2X the amount of disk cache memory for your system. If not, you are really testing the speed at which your CPU can read blocks out of the cache (not a fair test) IOZONE does not normally test the raw I/O speed of your disk or system. It tests the speed of sequential I/O to actual files. Therefore, this measurement factors in the efficiency of you machines file system, operating system, C compiler, and C runtime library. It produces a measurement which is the number of bytes per second that your system can read or write to a file. Written by: bill@tandem.com (Bill Norcott) 2.26. Byte This is a benchmark suite similar in spirit to SPEC, except that it's smaller and contains mostly things like "sieve" and "dhrystone". If you are comparing different UN*X machines for performance, this gives fairly good numbers. Note that the numbers aren't useful for anything except (perhaps, as in "maybe") for comparison against the same bench- mark suite run on some other system. 2.27. Netperf Netperf - a networking performance benchmark/tool. The current version includes throughput (bandwidth) and request/response (latency) tests for TCP and UDP using the BSD sockets API, DLPI, Unix Domain Sockets, the Fore ATM API, and HP HiPPI Link Level Access. Future versions may support additional tests for XTI/TLI-TCP/UDP, and WINSOCK; in no particular order, depending on the whim of the author and public opinion. Included with the source code is a .ps manual, two manpages, and a number of example scripts. More information about netperf, and a database of netperf results can be found with a forms-capable WWW broser at the Netperf Page. Various versions of netperf are also available via anonymous FTP from many locations, including, but not limited to: ftp://ftp.cup.hp.com/dist/networking/benchmarks ftp://col.hp.com/dist/networking/benchmarks ftp://ftp.sgi.com ftp://hpux.csc.liv.ac.uk (and mirrors) Questions regarding netperf can be directed via e-mail to Netperf Request or Rick Jones . 2.28. Nettest A network performance analysis tool developed at Cray. 2.29. ttcp TTCP is a benchmarking tool for determining TCP and UDP performance between 2 systems. Ttcp times the transmission and reception of data between two systems using the UDP or TCP protocols. It differs from common ``blast'' tests, which tend to measure the remote inetd as much as the network performance, and which usually do not allow measurements at the remote end of a UDP transmission. Written by: This program was created at the US Army Ballistics Research Lab (BRL) 2.30. CPU2 The CPU2 benchmark was invented by Digital Review (now Digital News and Review). To quote DEC, describing DN&R's benchmark, CPU2 ...is a floating-point intensive series of FORTRAN programs and consists of thirty-four separate tests. The benchmark is most relevant in predicting the performance of engineering and scientific applications. Performance is expressed as a multiple of MicroVAX II Units of Performance. The CPU2 benchmark is available via anonymous ftp from swedishchef.lerc.nasa.gov in the drlabs/cpu directory. Get cpu2.unix.tar.Z for unix systems or cpu2.vms.tar.Z for VMS systems. 2.31. Hartstone Hartstone is a benchmark for measuring various aspects of hard real time systems from the Software Engineering Institute at Carnegie Mellon. You can get this by anonymous ftp to ftp.sei.cmu.edu [128.237.2.179], in the pub/hartstone directory. 2.32. EuroBen The main contact for EuroBen is Aad van der Steen. Name: Aad van der Steen email: actstea@cc.ruu.nl address: Academish Computercentrum Utrecht Budapestlaan 6 3584 CD Utrecht The Netherlands phone: +31-30531444 fax: +31-30-531633 2.33. PC Bench/WinBench/NetBench See http://www.ziff.com/~zdbop PC Bench 9.0, WinBench 95 Version 1.0, Winstone 95 Version 1.0, MacBench 2.0, NetBench 3.01, and ServerBench 2.0 are the current names and versions of the benchmarks available from the Ziff-Davis Benchmark Operation (ZDBOp). 2.34. Sim An integer program that compares DNA segments for similarity. The following files are available from ftp.nosc.mil/pub/aburto: Source: sim.shar Result: sim.tbl --Alfred Aburto 2.35. Fhourstones Description: Small integer-only program that solves positions in the game of connect-4 using exhaustive search with a very large transposition table. Written in C. Originator: John.Tromp@cwi.nl Versions: 1.0 Availability of Source: ftp.nosc.mil:pub/aburto/c4.shar Availability of Results: ftp.nosc.mil:pub/aburto/c4.tbl Entry Last Updated: Mon Oct 11 10:00:00 1993 --John Tromp (tromp@cwi.nl) 2.36. Heapsort An integer program that uses the "heap sort" method of sorting a random array of long integers up to 2 megabytes in size. The following data files are available from ftp.nosc.mil/pub/aburto: Source: heapsort.c Result: heapsort.tbl --Alfred Aburto 2.37. Hanoi An integer program that solves the Towers of Hanoi puzzle using recursive function calls. The following data files are available from ftp.nosc.mil/pub/aburto: Source: hanoi.c Result: hanoi.tbl --Alfred Aburto 2.38. Flops Estimates MFLOPS rating for specific FADD, FSUB, FMUL, and FDIV instruction mixes. Four distinct MFLOPS ratings are provided based on the FDIV weightings from 25% to 0% and using register-to-register operations. Works with both scalar and vector machines. The following data files are available from ftp.nosc.mil/pub/aburto: Source: flops20.c Result: flops_1.tbl, flops_2.tbl, flops_3.tbl, and flops_4.tbl --Alfred Aburto 2.39. C LINPACK The LINPACK floating point program converted to C. The following data files are available from ftp.nosc.mil/pub/aburto: Source: clinpack.c Result: clinpack.dpr, clinpack.dpu, clinpack.spr, and clinpack.spu --Alfred Aburto 2.40. TFFTDP This program performs FFT's using the Duhamel-Hollman method for FFT's from 32 to 262,144 points in size. The following data files are available from ftp.nosc.mil/pub/aburto: Source: tfftdp.c Result: tfftdp.tbl --Alfred Aburto 2.41. Matrix Multiply (MM) This program (mm.c) contains 9 different algorithms for doing matrix multiplication (500 X 500 standard size). Results illustrate the enormous effects of cache thrashing versus algorithm, machine, compiler, and compiler options. The following data files are available from ftp.nosc.mil/pub/aburto: Source: mm.c Result: mm_1.tbl, and mm_2.tbl --Alfred Aburto 2.42. Digital Review [need info] 2.43. Nullstone The NULLSTONE Automated Compiler Performance Analysis Tool uses a QA approach of test coverage and isolation to measure an optimizer. The performance test suite is comprised of 6,500+ tests covering a wide range of compiler optimizations. The tool includes a report generator that generates performance reports, failure reports, regression reports, and competitive analysis reports. NULLSTONE runs on UNIX, Win3.1, Win95, WinNT, DOS, and MacOS. Additional information: Nullstone Corporaiton 48531 Warm Springs Boulevard, Suite 404 Fremont, CA 94555-7793 Phone: (800) 995-2841 (international (510) 490-6222) FAX: (510) 490-9333 email: info@nullstone.com www: http://www.nullstone.com --Christopher Glaeser 2.44. Rendermark [need info] 2.45. Bench++ Bench++ is a standard set of C++ benchmarks. More information is available from http://paul.rutgers.edu/~orost/bench_plus_plus.html. Source is available from: http://paul.rutgers.edu/~orost/bench_plus_plus.tar.Z ftp://paul.rutgers.edu/pub/bench++.tar.Z 2.46. Stream STREAM is a synthetic benchmark which measures sustainable memory bandwidth with and without simple arithmetic, based on the timing of long vector operations. STREAM is available in Fortran and C versions, and the results are used by all major vendors in high performance computing. A discussion of the benchmark and results on some 200 system configurations are available at: http://perelandra.cms.udel.edu/hpc/stream/ ftp://perelandra.cms.udel.edu/bench/stream/ Contact: John D. McCalpin, mccalpin@udel.edu, http://perelandra.cms.udel.edu/~mccalpin SECTION 3 - Terminology 3.1. Configuration When interpreting benchmark results, knowing the details of the configuration of the system that produced the results is critically important. Such details include, but aren't limited to: System vendor and model number CPU vendor, model, revision, and quantity CPU clock speed bus architecture, size, and speed RAM size primary and secondary cache sizes disk vendor, model, size, interface operating system vendor, version, and revision compiler vendor, version, revision, and options used 3.2. MFLOPS Millions of Floating Point Operations Per Second. Supposedly the rate at which the system can execute floating point instructions. Varies widely between different benchmarks and different configurations of the same benchmarks. Popular with marketing types because it's sounds like a "hard" value like miles per hour, and represents a simple concept. 3.3. MIPS Millions of Instructions Per Second. Supposedly the rate at which the system can execute a typical mix of floating point and integer arithmetic and logical instructions. Unlike MFLOPS, true MIPS rates are very difficult to measure. Like MFLOPS, MIPS ratings are popular with marketing people because they sound like "hard" values and represent a simple, intuitive concept. Most MIPS ratings quoted these days are based on the assumption that a Digital Equipment Corporation VAX 11-780 was exactly a 1 MIPS system. MIPS ratings for other systems were derived by dividing the Dhrystone rating by the 11-780's rating: 1758. Even if the 11-780 executed an average of one million instructions per second, a MIPS rating derived by this method would be in terms of VAX instructions, not the native instruction set of the rated system. Since the VAX is a Complex Instruction Set Computer (CISC), today's Reduced Instruction Set Computers (RISCs) need to execute more instructions than the VAX to do the same amount of work. 3.4. Representative A benchmark is said to be representative of a user's workload if the benchmark accurately predicts performance of the user's workload on a range of configurations. A benchmark that measures general compute performance won't be very representative for a user with a workload that's intensively floating point, graphical, memory bandwidth, or I/O oriented. When looking for a predictive benchmark, look for one with a workload as similar as possible to the yours, and see how well it correlates with the relative performance you've observed with your workload on various configurations. 3.5. Single Figure of Merit A single figure of merit is a rating like MIPS or MFLOPS that purports to rank the performance of systems. E.g., "a 25 MIPS system is faster than 30 MIPS system". In reality, though, single figures of merit are often misleading. A "25 MIPS" system may well execute a given workload faster than a "30 MIPS" system. Single figures of merit are generally not very representative of actual workloads--unless your workload consists of nothing by Dhrystone runs. 3.6. KAP Q: Can someone tell me what "KAP" is? A: Kuck and Associates, Inc. wrote amongst other things, a compiler preprocessor called KAP which attempts to do source code optimization. It is used by all the vendors on the old SPEC89 benchmark because it considerably improved the performance on the Matrix300 code. It is also used to do source code parallelization etc. KAP is not an acronym for Kuck and Associates Preprocessor. A widespread misconception about KAP is that it is a single preprocessor that is used on various platforms. It is not. KAP is tailored to specific processor-compiler pairs, as KAI builds in knowledge of exactly what the appropriate compiler will do with particular language constructs and what the resultant machine code would be. For example, it would need to know how many temporary variables it could add and have the compiler allocate those to registers, in addition to knowing how many registers the processor had. It would need to know about compiler inlining capabilities, processor cache behavior, BLAS library availability, vector processor capability, spelling of directives, etc. As you can see, it's not just a case of "porting" a fixed product to different platforms - each vendor contracts with KAI to develop a defined set of optimizations in KAP for the desired platform-compiler pairs. This requires a lot of work on the part of both KAI and the vendor. To get the most out of KAP, you need to be intimately aware of what your application is doing and what types of "aggressive" optimizations your application can tolerate - some may yield wrong answers, which is why most compilers don't provide them automatically. Also, KAP can be incredibly slow itself, though this may be of little import if "KAPing" is done infrequently. It is not a "magic bullet". It certainly has its uses, but nothing can really compensate for a little care and thought put into algorithm design and coding plus profiling, at least for "real applications". Yes, KAP makes MATRIX300 look really good. If your application is MATRIX300, fine. KAP is a good product, and one which, as far as I know, has no peers in the industry, but it should not be used automatically or blindly. For more information about KAP, send email to sales@kai.com, call 217-356-2288 or visit http://www.mai.com. SECTION 4 - Other Sources of Information 4.1. WORLD WIDE WEB 4.1.1 System Optimization Information There is now a new www page called System Optimization Information that contains the latest benchmark comparison lists and the locations on where to find the actual benchmarking programs. On each list, there is also info. on how to submit your own results to the list author. The site also contains overclock (marginal speed testing) info., comp. related ftp sites, and more.. Check it out.. The address is http://www.dfw.net/~sdw sdw@dfw.net (Scott Wainner) 4.2. FTP Source for various benchmarks is available from sony.com in the directory 'pub/benchmarks'. Various benchmarks and results can be obtained via anonymous ftp from ftp.nosc.mil (128.49.192.51) in directory 'pub/aburto'. Source and results for router benchmarks is available via anonymous FTP from hsdndev.harvard.edu (128.103.202.40) in directory 'pub/ndtl'. The sequential C versions of the NAS CFD (computational fluid dynamics) benchmarks - appbt and appsp - are now available from the anonymous ftp site ftp.cs.wisc.edu:/wwt/Misc/NAS. 4.3. COMMERCIAL BENCHMARKS AIM Technology can be contacted by phone at 1-800-848-UNIX or 1-408-748-8649, or by e-mail to benchinfo@aim.com. There are a number of commercial performance evaluation tools available, eg. from Performix (703 448-6606 Fax 703 893-1939) and Performance Awareness Corporation (919 870-880) that are aimed at system level X Windows/application benchmarks. 4.4. PUBLICATIONS The Benchmark Handbook: for database and transaction processing systems. Edited by Jim Gray. M. Kaufmann Publishers, San Mateo, Calif., c1991. 334 p., ill. The Morgan Kaufmann series in data management systems. ISSN 1046-1698 Transaction Processing Performance Council (TPC): TPC-A Standard Specification, Revision 1.1. Shanley Public Relations, San Jose, California, March 1992. Transaction Processing Performance Council (TPC): TPC-C Standard Specification, Revision 1.0. Shanley Public Relations, San Jose, California, August 1992. Complete TPC Results, Performance Evaluation Review, Vol. 19 #2, Aug. 91 (14-23). Shanley Public Relations. Complete TPC Results, Performance Evaluation Review, Vol. 19 #3, Feb. 93 (32-35). Shanley Public Relations. Survey of Benchmarks, E. R. Brocklehurst. NPL Report DTITC 192/91, Nov. 91 (27 pages). A Benchmark Tutorial, W. Price. IEEE Micro, Oct. 89 (28-43). "An Overview of Common Benchmarks", Reinhold P. Weicker, IEEE Computer, vol. 23, no. 12 (Dec. 1990), 65-75 Republished in slightly modified form in "A detailed look at some popular benchmarks", Reinhold P. Weicker. Parallel Computing, No. 17 (1991), 1153-1172 "Cache Performance of the SPEC92 Benchark Suite", Jeffrey D. Gee, Mark D. Hill, Dionisios N. Pnevmatikatos, Alan Jay Smith, p 17-27, IEEE Micro, August 1993. "Performance Measurements of the X Window System Communication Protocol", R. Droms and W. R. Dyksen, Software Practice and Experience, ISSN 0038-0644 pp S2/119 [this paper covers xlog]. "An Execution Profiler for Window-oriented Applications", Aloke Gupta and Wen-Mei W. Hwu, to appear in Software Practice & Experience. hwu@crhc.uiuc.edu [this paper covers xmeasure and xprof]. "Trace Analysis of the X Window System Protocol", Stuart W. Marks & Laurence P. G. Cable, The X Resource, ISSN 1058-5591, Issue Five, pp 149 [this covers xtrace and friends]. "X Applications Performance on Client/Server Systems", Ken Oliver, Xhibition '93 Conference Proceedings, pp 176 High Performance Computing: RISC Architectures, Kevin Dowd, Optimization, and Benchmarks. O'Reilly & Associates, 1993. ISBN 1-56592-032-5. 4.5 OTHER NETWORK SERVICES 4.5.1 PDS Xnetlib includes PDS: A Performance Database Server The process of gathering, archiving, and distributing computer benchmark data is a cumbersome task usually performed by computer users and vendors with little coordination. Most important, there is no publicly-available central depository of performance data for all ranges of machines from personal computers to supercomputers. This Xnetlib release contains an Internet-accessible performance database server (PDS) which can be used to extract current benchmark data and literature. The current PDS provides an on-line catalog of the following public-domain computer benchmarks: Linpack Benchmark, Parallel Linpack Benchmark, Bonnie Benchmark, FLOPS Benchmark, Peak Performance (part of Linpack Benchmark), Fhourstones and Dhrystones, Hanoi Benchmark, Heapsort Benchmark, Nsieve Benchmark, Math Benchmark, Perfect Benchmarks, and Genesis Benchmarks. Rank-ordered lists of machines per benchmark available as well as relevant papers and bibiliographies. A browse facility allows the user to extract a variety of machine/benchmark combinations, and a search feature permits specific queries into the performance database. PDS does not reformat or present the benchmark data in any way that conflicts with the original methodology of any particular benchmark; it is thereby devoid of any subjective interpretations of machine performance. PDS is invoked by selecting the "Performance" button in the Xnetlib Menu Options. Questions and comments for PDS should be mailed to "utpds@cs.utk.edu." How to get it - By WWW from http://netlib.cs.utk.edu/performance/html/PDStop.html By anonymous ftp from netlib2.cs.utk.edu in xnetlib/xnetlib3.4.shar.Z By email, send a message to netlib@ornl.gov containing the line: send xnetlib3.4.shar from xnetlib Precompiled executables for various platforms are also available. For information get the index file for the xnetlib library: via anonymous ftp from netlib2.cs.utk.edu get xnetlib/index via email send a message to netlib@ornl.gov containing the line: send index from xnetlib If you have any questions, please send mail to xnetlib@cs.utk.edu