comp.benchmarks FAQ

comp.benchmarks Frequently Asked Questions, With Answers
Version 1.0, Sat Mar 16 12:12:48 1996
Copyright 1993-96 Dave Sill
Not-for-profit redistribution permitted provided this notice is
included.

NOTE: Many of the answers to these questions were derived from
articles posted to comp.benchmarks throughout the years. I have
generally made no effort to attribute them to their sources because I
didn't have time to get everyone's approval--or even the time to
include the attributions :-). If you recognize something you wrote,
and you'd like it attributed, let me know.]

CONTENTS
SECTION 1 - General Q/A
1.1. What is comp.benchmarks?
1.2. What is a benchmark?
1.3. How are benchmarks used?
1.4. What kinds of performance do benchmarks measure?
1.5. What are all these strange messages from Eugene Miya?
1.6. What are the most common benchmarks?
1.7. Where can I get benchmark results and source code?
1.8. How can one safely interpret benchmark results?
1.9. What are the pitfalls involved with running benchmarks?

SECTION 2 - Common Benchmarks
2.1. 007 (ODBMS)
2.2. AIM
2.3. Dhrystone
2.4. Khornerstone
2.5. LFK (Livermore Loops)
2.6. LINPACK
2.7. MUSBUS
2.8. NAS Kernels
2.9. Nhfsstone
2.10. PERFECT
2.11. RhosettaStone
2.12. SLALOM
2.13. SPEC
2.14. SSBA
2.15. Sieve of Eratosthenes
2.16. TPC
2.17. WPI Benchmark Suite
2.18. Whetstone
2.19. Xstone
2.20. bc
2.21. SYSmark
2.22. Stanford
2.23. Bonnie
2.24. IOBENCH
2.25. IOZONE
2.26. Byte
2.27. Netperf
2.28. Nettest
2.29. ttcp
2.30. CPU2
2.31. Hartstone
2.32. EuroBen
2.33. PC Bench/WinBench/NetBench
2.34. Sim
2.35. Fhourstones
2.36. Heapsort
2.37. Hanoi
2.38. Flops
2.39. C LINPACK
2.40. TFFTDP
2.41. Matrix Multiply (MM)
2.42. Digital Review
2.43. Nullstone
2.44. Rendermark
2.45. Bench++
2.46. Stream

SECTION 3 - Terminology

3.1. Configuration
3.2. MFLOPS
3.3. MIPS
3.4. Representative
3.5. Single Figure of Merit
3.6. KAP

SECTION 4 - Other Sources of Information

4.1. WORLD WIDE WEB
4.2. FTP
4.3. COMMERCIAL BENCHMARKS
4.4. PUBLICATIONS
4.5. OTHER NETWORK SERVICES

SECTION 1 - General Q/A

1.1. What is comp.benchmarks?

Comp.benchmarks is a USENET newsgroup for discussing computer
benchmarks and publishing benchmark results and source code. If
it's about benchmarks, this is the place to post or crosspost it.

1.2. What is a benchmark?

A benchmark is test that measures the performance of a system or
subsystem on a well-defined task or set of tasks.

1.3. How are benchmarks used?

Benchmarks are commonly used to predict the performance of an
unknown system on a known, or at least well-defined, task or
workload.

Benchmarks can also be used as monitoring and diagnostic tools.
By running a benchmark and comparing the results against a known
configuration, one can potentially pinpoint the cause of poor
performance. Similarly, a developer can run a benchmark after
making a change that might impact performance to determine the
extent of the impact.

Benchmarks are frequently used to ensure the minimum level of
performance in a procurement specification. Rarely is performance
the most important factor in a purchase, though. One must never
forget that it's more important to be able to do the job correctly
than it is to get the wrong answer in half the time.

1.4. What kinds of performance do benchmarks measure?

Benchmarks are often used to measure general things like graphics,
I/O, compute (integer and floating point), etc., performance, but
most measure more specific tasks like rendering polygons, reading
and writing files, or performing operations on matrixes.

Any aspect of computer performance that matters to the user can be
benchmarked.

1.5. What are all these strange messages from Eugene Miya?

They are automatically-posted sections of a multi-part Frequently
Asked Questions (FAQ) for comp.benchmarks.

The subject headers look like:

Subject: [l/m 3/17/92] WPI Benchmark Suite (1.1) (19/28) c.be.FAQ

which means:

This is part 19 of the 28-part comp.benchmarks FAQ. The topic
of this section is "WPI Benchmark Suite (1.1)", it was last
modified on March 17, 1992, and it is automatically posted on
the 17th day of each month.

The body of these articles starts with an index of the sections
(panels) of the multipart FAQ, with the current section rotated to
the top. For example:

19 WPI Benchmark
20 Equivalence
21 TPC
22
23
24
25 Ridiculously short benchmarks
26 Other miscellaneous benchmarks
27
28 References
1 Introduction to the FAQ chain and netiquette
2 Benchmarking concepts
3 PERFECT Club/Suite
4
5 Performance Metrics
6
7 Music to benchmark by
8 Benchmark types
9 Linpack
10
11 NIST source and .orgs
12 Benchmark Environments
13 SLALOM
14
15 12 Ways to Fool the Masses with Benchmarks
16 SPEC
17 Benchmark invalidation methods
18

Notice that there are some unused sections (4, 6, 10, 14, 18,
22-24, 27).

Some people find these FAQ's confusing and configure their
newsreader to automatically "kill" them (mark them as read).

1.6. What are the most common benchmarks?

See Section 2 - Common Benchmarks

1.7. Where can I get benchmark results and source code?

See Section 2 - Common Benchmarks

1.8. How can one safely interpret benchmark results?

There are many dangers involved in correctly understanding and
interpreting benchmark results, whether they come from your own
locally generated test or are supplied by the vendor of a
commercial system.

Here are some things to take into account:

1) Which benchmark was run? You'll need the name, version, and
details on any changes made, whether for portability or to
improve performance.

2) Exactly what configuration was the benchmark run on?

a) processor model, speed, cache, number of CPU's
b) memory
c) software versions (operating system, compilers, relevant
applications, etc.)
d) compiler/loader options and flags used build executables
e) state of the system (single user, multiuser, active,
inactive, etc.)
f) peripherals, e.g., hard disk drives

3) How does performance on the benchmark relate to my workload?
This is really the key question. Without knowing what a
benchmark measures, one can't begin to determine whether
systems that perform better on it will perform better on their
own workload.

1.9. What are the pitfalls involved with running benchmarks?

They pretty much mirror the difficulties in interpretation of
results.

First, you need to document what you're running. If it's a
well-known program, record the name, version number, and any
changes you've made. It's a good idea to use some kind of version
control system (e.g., RCS or SCCS on UNIX) to keep track of
changes and make it possible to backtrack to previous versions.
If it's a locally-written benchmark, it's especially important to
use a version control system.

Second, record as much information about the system configuration
as you can. Sometimes the most seemingly insignificant thing can
have a profound effect on the performance of a system, and it's
very hard to repeat results without being able to recreate the
original configuration.

SECTION 2 - Common Benchmarks

2.1. 007 (ODBMS)

Description: Designed to simulate a CAD/CAM environment.

Tests:

- pointer traversals over cached data, disk-resident data,
sparse traversals, and dense traversals

- updates: indexed and unindexed object fields, repeated
updates, sparse updates, updates od cached data and creation
and deletion of objects

- queries: exact-match lookup, ranges, collection scan,
path-join, ad-hoc join, and single-level make

Originator: University of Wisconsin
Versions: unknown
Availability of Source: free from ftp.cs.wisc.edu:/007
Availability of Results: free from ftp.cs.wisc.edu:/007
Entry Last Updated: Thu Apr 15 15:08:07 1993

2.2. AIM

- 1989
- Aim Technology, Palo Alto
- C
- 2 suites (suite III and V)

Suite III: simulation of applications (task- or device-specific)
- Task-specific routines (word processing, database management, accounting)
- Device-specific routines (memory, disk, MFlop/s, IO/s
- all measurement represent a percentage of VAX 11/780 performance (100%)
- second information is user support (maximum concurrent users)
- VAX 11/780 == 12 users

In general, the AIM Suite III gives an overall performance indication.

Suite V: measures throughput in a multitasking workstation environment
- incremental system loading
- testing multiple aspects of system performance

The graphically displayed results plot the workload level versus time.
Several different models characterize various user environments (financial,
publishing, sw engeneering). The published reports are copyrighted.

See:
Walter J. Price, A benchmark Tutorial, IEEE Micro, Oct. 1989 (28-43)

2.3. Dhrystone

Description: Short synthetic benchmark program intended to be
representative for system (integer) programming. Based on published
statistics on use of programming language features; see original
publication in CACM 27,10 (Oct. 1984), 1013-1030. Originally
published in Ada, now mostly used in C. Version 2 (in C) published
in SIGPLAN Notices 23,8 (Aug. 1988), 49-62, together with
measurement rules. Version 1 is no longer recommended since
state-of-the-art compilers can eliminate too much "dead code" from
the benchmark (However, quoted MIPS numbers are often based on
version 1).
Problems: Due to its small size (100 HLL statements, 1-1.5 KB code),
the memory system outside the cache is not tested; compilers can too
easily optimize for Dhrystone; string operations are somewhat
overrepresented.
Recommendation: Use it for controlled experiments only; don't blindly
trust single Dhrystone MIPS numbers quoted somewhere (Don't do this
for any benchmark).
Originator: Reinhold Weicker, Siemens Nixdorf (weicker.muc@sni.de)
Versions in C: 1.0, 1.1, 2.0, 2.1 (final version, minor corrections
compared with 2.0)
See also: R.P.Weicker, A Detailed Look ... (see Publications, 4.3)
Availability of source: netlib@ornl.gov, ftp.nosc.mil:pub/aburto
Availability of results (no guarantee of correctness): Same as above
Entry last updated: Dec. 30, 1993, Reinhold Weicker

2.4. Khornerstone

Description: Multipurpose benchmark used in various periodicals.

Originator: Workstation Labs
Versions: unknown
Availability of Source: not free
Availability of Results: UNIX Review
Entry Last Updated: Thu Apr 15 15:22:10 1993

2.5. LFK (Livermore Loops)

netlib.att.com:/netlib/benchmark/livermore*

2.6. LINPACK

Description: Kernel benchmark developed from the "LINPACK" package of
linear algebra routines. Originally written and commonly used in
Fortran; a C version also exists. Almost all of the benchmark's
time is spent in a subroutine ("saxpy" in the single-precision
version, "daxpy" in the double-precision version) doing the inner
loop for frequent matrix operations:
y(i) = y(i) + a * x(i)
The standard version operates on 100x100 matrices; there are also
versions for sizes 300x300 and 1000x1000, with different
optimization rules.
Problems: Code is representative only for this type of computation.
Linpack is easily vectorizable on most systems.
Originator: Jack Dongarra, Comp. Sci. Dept., Univ. of Tennessee,
dongarra@cs.utk.edu
See also: R.P.Weicker, A Detailed Look ... (see Publications, 4.3)
Entry last updated: Dec. 30, 1993, Reinhold Weicker

netlib@ornl.gov: source, results

netlib.att.com:/netlib/benchmark/linpack*: source

see also: C LINPACK

2.7. MUSBUS

monu1.cc.monash.edu.au:/pub/musbus.sh

2.8. NAS Kernels

The sequential C versions of the NAS CFD (computational fluid dynamics)
benchmarks - appbt and appsp - are now available from the anonymous ftp
site ftp.cs.wisc.edu:/wwt/Misc/NAS.

This distribution contains the C version of the fortran cfd codes written
at NASA Ames Research Center by Sisira Weeratunga. These codes were
converted to C by a team of students for their class project for CS
838-3/ChE 562 offered in Spring 1993 by Mark D. Hill, Sangtae Kim and
Mary Vernon at the University of Wisconsin at Madison. CS 838-3/ChE
562 was an experimental course that brought computer scientists and
computation scientists together to promote interdisciplinary research.

You should have a NAS license for the original fortran code. NASA has
given us permission to distribute our ``significant'' changes freely.

You can obtain the sequential fortran codes and a license by writing
to:

ATTN: NAS Parallel Benchmark Codes
NAS Systems Division
Mail Stop 2588
NASA Ames Research Center
Moffett Field
CA 94035

THIS SOFTWARE IS PROVIDED "AS IS". WE MAKE NO WARRANTIES ABOUT ITS
CORRECTNESS OR PERFORMANCE.

Douglas C. Burger (dburger@cs.wisc.edu)
Shubhendu S. Mukherjee (shubu@cs.wisc.edu)
Computer Sciences Department
University of Wisconsin at Madison

2.9. Nhfsstone

Benchmark intended to measure the performance of file servers that
follow the NFS protocol. The work in this area continued within
the LADDIS group and finally within SPEC. The SPEC benchmark
097.LADDIS (SFS benchmark suite, see separate FAQ file on SPEC) is
intended to replace Nhfsstone, it is superior to Nhfsstone in
several aspects (multi-client capability, less client sensitivity).

--Reinhold Weicker

2.10. PERFECT

See Miya panel #3

2.11. RhosettaStone

See Miya panel #26

eos.arc.nasa.gov

2.12. SLALOM

Miya panel #13

tantalus.al.iastate.edu:/pub/Slalom/ [129.186.200.15]

2.13. SPEC

SPEC stands for "Standard Performance Evaluation Corporation", a
non-profit organization with the goal to "establish, maintain and
endorse a standardized set of relevant benchmarks that can be applied
to the newest generation of high-performance computers" (from SPEC's
bylaws). The SPEC benchmarks and more information can be obtained
from
SPEC [Standard Performance Evaluation Corporation]
10754 Ambassador Drive, Suite 201
Manassas, VA 22110, USA
USA

Phone: +1-703-331-0180
Fax: +1-703-331-0181
E-Mail: spec-ncga@cup.portal.com

The current SPEC benchmark suites are

CINT95 CPU intensive integer benchmarks ) together:
CFP95 CPU intensive floating point benchmarks ) SPEC95
SDM UNIX Software Development Workloads
SFS System level file server (NFS) workload

The old CPU benchmark suites

CINT92 (CPU intensive integer benchmarks) ) together:
CFP92 (CPU intensive floating point benchmarks) ) SPEC92

will cease to be supported by SPEC in 1996.

See separate FAQ file on SPEC benchmarks.

--Reinhold Weicker

The following data files are available from ftp.nosc.mil/pub/aburto:

specin89.tbl
specfp89.tbl
speccorr.tbl

specin92.tbl
specft92.tbl
--Alfred Aburto

2.14. SSBA

The SSBA is the result of the studies of the AFUU (French Association
of Unix Users) Benchmark Working Group. This group, consisting of
some 30 active members of varied origins (universities, public and
private research, manufacturers, end users), has assigned itself the
goal of thinking on the problem of assessing the performance of data
processing systems, collecting a maximum number of tests available
throughout the world, dissecting the codes and results, discussing the
utility, fixing versions and supplying them in the form of a magnetic
tape with various comments and procedures.

This tape is therefore both a simple and coherent tool for the end
users and also for the specialists, providing a clear and pertinent
initial approximation of the performance, and could also become a
"standard" in the Unix (R) world. In this way the SSBA (Synthetic
Suite of Benchmarks from the AFUU) originated and here you find
release 1.21E.

athene.uni-paderborn.de:/doc/magazin/ix/tools/ssba1.22.tar
ftp.germany.eu.net:/pub/sysadmin/benchmark/ssba/ssba.shar.Z
grasp1.univ-lyon1.fr:/pub/nfs-mounted/ftp.univ-lyon1.fr/mirrors/unix/ssba/ssba-1.22English.tar.gz
grasp1.univ-lyon1.fr:/pub/nfs-mounted/ftp.univ-lyon1.fr/mirrors/unix/ssba/ssba-1.22French.tar.gz
grasp1.univ-lyon1.fr:/pub/nfs-mounted/ftp.univ-lyon1.fr/mirrors/unix/ssba/ssba-2.0F.tar.gz
grasp1.univ-lyon1.fr:/pub/nfs-mounted/ftp.univ-lyon1.fr/mirrors/unix/ssba/ssba-synthesis.tar.gz
ftp.inria.fr:/system/benchmark/SSBA/ssba1.22E.tar.Z
ftp.inria.fr:/system/benchmark/SSBA/ssba1.22F.tar.Z
ftp.inria.fr:/system/benchmark/SSBA/ssba-syntheses.tar.Z

2.15. Sieve of Eratosthenes

An integer program that generates prime humbers using a method known
as the Sieve of Eratosthenes.

otis.stanford.edu:/pub/benchmarks/c/small/sieve.c
ftp.nosc.mil:pub/aburto/nsieve.c
ftp.nosc.mil:pub/aburto/nsieve.tbl
sunic.sunet.se:/SRC/sec8/bench-dry/sieve.c

2.16. TPC

TPC-A is a standardization of the Debit/Credit benchmark which was first
published in DATAMATION in 1985. It is based on a single, simple,
update-intensive transaction which performs three updates and one
insert across four tables. Transactions originate from terminals, with
a requirement of 100 bytes in and 200 bytes out. There is a fixed scaling
between tps rate, terminals, and database size. TPC-A requires an external
RTE (remote terminal emulator) to drive the SUT (system under test).

TPC-B uses the same transaction profile and database schema as TPC-A, but
eliminates the terminals and reduces the amount of disk capacity which must
be priced with the system. TPC-B is significantly easier to run because an
RTE is not required.

TPC-C is completely unrelated to either TPC-A or B. TPC-C tries to model
a moderate to complex OLTP system. The benchmark is conceptually based on
an order entry system. The database consists of nine tables which contain
information on customers, warehouses, districts, orders, items, and stock.

The system performs five kinds of transactions: entering a new order,
delivering orders, posting customer payments, retrieving a customer's most
recent order, and monitoring the inventory level of recently ordered items.

Transactions are submitted from terminals providing a full screen user
interface. (The spec defines the exact layout for each transaction.)

TPC-C was specifically designed to address many of the shortcomings of
TPC-A. I believe it does this in many areas. It exercises a much
broader cross-section of database functionality than TPC-A. Also, the
implementation rules are much stricter in critical areas such as
database transparency and transaction isolation. Overall, TPC-C
results will be a much better indicator of RDBMS and OLTP system
performance than previous TPC benchmarks.

- 1988
- Transaction Processing-performance Council, San Jose
- non-profit corporation of 44 sw- and hw-companies to define
transaction processing and database benchmarks
- Cobol
- Contents (Basic system OLTP, Business appl. services, Databases,
Complex data and Real time appl.)
- 4 Benchmark-components (Data-, Database-, System and Query-model)
- 4 Suites: TPC-A, ..., TPC-D

Suite TPC-A: On-line transaction processing of a database env.
- measures performance in update-intensive database env. (OLTP)
- result in transactions/sec
- 2 metrivcs (local and wide area networks)

Suite TPC-B: Database benchmark
- Database throughput (transactions/sec)
- no OLTP
-
Suite TPC-C: Order-Entry benchmark (OLTP env.)
- Business application services (Order-entry, Inventory Control,
customer support, accounting)

Suite TPC-D: Decision-Support benchmark (OLTP env.)
- Database stress test, simulation of a large database, complex queries

See:

Miya panel #21

Shanley Public Relations, Complete TPC Results, Performance Evaluation Review,
Vol. 19 #2, Aug. 91 (14-23)

Shanley Public Relations, Complete TPC Results, Performance Evaluation Review,
Vol. 19 #3, Feb. 93 (32-35)

DG.COM

2.17. WPI Benchmark Suite

See Miya panel #19

wpi.wpi.edu

2.18. Whetstone

Description: The first major synthetic benchmark program, intended to
be representative for numerical (floating-point intensive)
programming. Based on statistics gathered at National Physical Lab
in England, using an Algol 60 compiler which translated Algol into
instructions for the imaginary Whetstone machine. The compilation
system was named after the small town Whetstone outside the City of
Leicester, England, where it was designed.
Problems: Due to the small size of its modules, the memory system
outside the cache is not tested; compilers can too easily optimize
for Whetstone; mathematical library functions are overrepresented.
Originator: Brian Wichmann, NPL (baw@seg.npl.co.uk) [E-mail address
as of fall 1990, not re-verified. Reinhold]
Original publication: H.J.Curnow and B.A.Wichmann: A Synthetic
Benchmark. The Computer Journal 19,1 (1976), 43-49
See also: R.P.Weicker, A Detailed Look ... (see Publications, 4.3)

--Reinhold Weicker

cnam.cnam.fr:/pub/Ada/Repository/benchmarks/benwhet.com.Z
draci.cs.uow.edu.au:/netlib/benchmark/whetstone*
ftp.germany.eu.net:/pub/sysadmin/benchmark/whetston/whetstone.tar.Z
lth.se:/pub/benchmark/whetstone.tar.gz
netlib.att.com:/netlib/benchmark/whetstone*

2.19. Xstone

netcom.com:/pub/micromed/uploads/xstones.summary.z
pith.uoregon.edu:/pub/src/X11/xbench/scripts/xstones.awk
alf.uib.no:/pub/Linux/BETA/X_S3/801.xstones

2.20. bc

2.21. SYSmark

July 28, 1993, Santa Clara, Calif.--The Business Applications
Performance Corp. (BAPCo) announces SYSmark93 for Windows and
SYSmark93 for DOS, benchmark software that provides objective
performance measurement based on the world's most popular PC
applications and operating systems. A third program, SYSmark93 for
Servers, is due for release by BAPCo in the third quarter of 1993.

SYSmark93 provides benchmarks that can be used to objectively measure
performance of IBM PC-compatible hardware for the tasks users perform
on a regular basis. The benchmarks are comparative tools for those
who make purchasing decisions for anywhere from 10 to a thousand or
more PCs. SYSmark93 has been endorsed by the BAPCo membership, which
includes the world's leading PC hardware and software vendors, chip
manufacturers, and industry publications.

SYSmark93 benchmarks represent the workloads of popular programs in
such applications as word processing, spreadsheets, database, desktop
graphics and software development. Benchmarking can be conducted on
the user's own system or at a vendor's site using the standards set by
BAPCo to ensure consistency of the results.

SYSmark93 for Windows is for those interested in evaluating
systems-level performance of PCs running Microsoft Windows
applications. The program features a new Windows based workload
manager, scripts for 10 Windows applications, automation tools, and a
disclosure report generator. SYSmark93 for DOS, an upgrade to
SYSmark92, is aimed at those interested in evaluating systems-level
performance of PCs running DOS applications only. Both programs can
generate performance metrics in three ways: as a composite of all the
different applications; for a specific category of applications, such
as word processing or spreadsheets; or for individual software
programs.

Workloads based on the following applications are included in
SYSmark93 for Windows and SYSmark93 for DOS:

SYSmark93 for Windows
WORD PROCESSING
Word for Windows 2.0b
WordPerfect for Windows 5.2
AmiPro 3.0
SPREADSHEETS
Excel 4.0
Lotus 1-2-3 for Windows 4.0
DATABASE
Paradox for Windows 1.0
DESKTOP GRAPHICS
CorelDraw 3.0
DESKTOP PRESENTATION
Freelance Graphics for Windows 2.0
PowerPoint 3.0
DESKTOP PUBLISHING
PageMaker 5.0

SYSmark93 for DOS
WORD PROCESSING
WordPerfect 5.1
SPREADSHEETS
Lotus 1-2-3 3.4
QuattroPro 4.0
DATABASE
Paradox 4.0
dBASE IV 1.5
DESKTOP GRAPHICS
Harvard Graphics 3.0
SOFTWARE DEVELOPMENT
Borland C++ 3.1
Microsoft C 6.00

SYSmark93 for Windows and SYSmark93 for DOS will be available in
August from BAPCo for $390 each. Licensed SYSmark92 users will be
able to upgrade to either SYSmark93 for Windows or SYSmark93 for DOS
for $99 each.

A non-profit corporation, BAPCo's charter is to develop and distribute
a set of objective performance benchmarks based on popular computer
applications and industry standard operating systems. Current BAPCo
members include Adaptec, Advanced Micro Devices, AER Energy Resources,
Apricot Computers, Chips and Technologies, Compaq, Cyrix, Dell,
Digital Equipment Corp., Gateway2000, Epson, Hewlett-Packard, IBM,
Infoworld, Intel, Lotus, Microsoft, NCR, Unisys and Ziff-Davis Labs.

FOR MORE INFORMATION:

John Peterson
BAPCo
Phone: 408-988-7654
email: John_E_Peterson@ccm.hf.intel.com

Bob Cramblitt
Cramblitt & Company
Phone: 919-481-4599
Fax: 919-481-4639

2.22. Stanford

- 1988
- Stanford University (J. Hennessy, P. Nye)
- C
- comparison of RISC and CISC
- contains the 2 modules Stanford Integer and Stanford Floating Point

Stanford Integer:
- 8 little applications (integer matrix mult., sorting alg. (quick,
bubble, tree), permutation, hanoi, 8 queens, puzzle)

Stanford Floating Point:
- 2 little applications (FFT, matrix mult.)

The characteristics of the programs vary, but most of them have array
accesses. There seems to be no official publication (only a printing
in a performance report). Secondly, there is no defined weighting of
the results (Sun and MIPS compute the geometric mean).

Survey of Benchmarks, E. R. Brocklehurst.

A Benchmark Tutorial, W. Price.

"A detailed look at some popular benchmarks", R. P. Weicker.

2.23. Bonnie

This is a file system benchmark that attempts to study bottlenecks -
it is named 'Bonnie' for semi-obvious reasons.

Specifically, these are the types of filesystem activity that have
been observed to be bottlenecks in I/O-intensive applications, in
particular the text database work done in connection with the New
Oxford English Dictionary Project at the University of Waterloo.

It performs a series of tests on a file of known size. By default,
that size is 100 Mb (but that's not enough - see below). For each
test, Bonnie reports the bytes processed per elapsed second, per CPU
second, and the percent CPU usage (user and system).

In each case, an attempt is made to keep optimizers from noticing it's
all bogus. The idea is to make sure that these are real transfers
to/from user space to the physical disk.

Written by:

Tim Bray

2.24. IOBENCH

IOBENCHP is a multi-stream benchmark that uses a controlling process
(iobench) to start, coordinate, and measure a number of "user"
processes (iouser); the Makefile parameters used for the SPEC version
of IOBENCHP cause ioserver to be built as a "do nothing" process.

Written by:

Barry Wolman (barry@s66.prime.com) [probably doesn't work]
Prime Computer
500 Old Connecticut Path
Framingham, MA 01701
508/620-2800, ext. 1100 (voice)
508/879-8674 (FAX)

2.25. IOZONE

This test writes a X MEGABYTE sequential file in Y byte chunks, then
rewinds it and reads it back. [The size of the file should be
big enough to factor out the effect of any disk cache.]. Finally,
IOZONE deletes the temporary file

The file is written (filling any cache buffers), and then read. If the
cache is >= X MB, then most if not all the reads will be satisfied from
the cache. However, if it is less than or equal to .5X MB, then NONE of
the reads will be satisfied from the cache. This is becase after the
file is written, a .5X MB cache will contain the upper .5 MB of the test
file, but we will start reading from the beginning of the file (data
which is no longer in the cache)

In order for this to be a fair test, the length of the test file must
be AT LEAST 2X the amount of disk cache memory for your system. If
not, you are really testing the speed at which your CPU can read blocks
out of the cache (not a fair test)

IOZONE does not normally test the raw I/O speed of your disk or system.
It tests the speed of sequential I/O to actual files. Therefore, this
measurement factors in the efficiency of you machines file system,
operating system, C compiler, and C runtime library. It produces a
measurement which is the number of bytes per second that your system
can read or write to a file.

Written by: bill@tandem.com (Bill Norcott)

2.26. Byte

This is a benchmark suite similar in spirit to SPEC, except that it's
smaller and contains mostly things like "sieve" and "dhrystone". If
you are comparing different UN*X machines for performance, this gives
fairly good numbers. Note that the numbers aren't useful for anything
except (perhaps, as in "maybe") for comparison against the same bench-
mark suite run on some other system.

2.27. Netperf

Netperf - a networking performance benchmark/tool. The current version
includes throughput (bandwidth) and request/response (latency) tests
for TCP and UDP using the BSD sockets API, DLPI, Unix Domain Sockets,
the Fore ATM API, and HP HiPPI Link Level Access. Future versions may
support additional tests for XTI/TLI-TCP/UDP, and WINSOCK; in no
particular order, depending on the whim of the author and public
opinion. Included with the source code is a .ps manual, two manpages,
and a number of example scripts.

More information about netperf, and a database of netperf results can
be found with a forms-capable WWW broser at the

Netperf Page.

Various versions of netperf are also available via anonymous FTP from
many locations, including, but not limited to:

ftp://ftp.cup.hp.com/dist/networking/benchmarks
ftp://col.hp.com/dist/networking/benchmarks
ftp://ftp.sgi.com
ftp://hpux.csc.liv.ac.uk (and mirrors)

Questions regarding netperf can be directed via e-mail to
Netperf Request or Rick Jones
.

2.28. Nettest

A network performance analysis tool developed at Cray.

2.29. ttcp

TTCP is a benchmarking tool for determining TCP and UDP performance
between 2 systems.

Ttcp times the transmission and reception of data between two systems
using the UDP or TCP protocols. It differs from common ``blast''
tests, which tend to measure the remote inetd as much as the network
performance, and which usually do not allow measurements at the remote
end of a UDP transmission.

Written by:

This program was created at the US Army Ballistics Research Lab (BRL)

2.30. CPU2

The CPU2 benchmark was invented by Digital Review (now Digital
News and Review). To quote DEC, describing DN&R's benchmark, CPU2
...is a floating-point intensive series of FORTRAN programs and
consists of thirty-four separate tests. The benchmark is most
relevant in predicting the performance of engineering and scientific
applications. Performance is expressed as a multiple of MicroVAX II
Units of Performance.

The CPU2 benchmark is available via anonymous ftp from
swedishchef.lerc.nasa.gov in the drlabs/cpu directory. Get
cpu2.unix.tar.Z for unix systems or cpu2.vms.tar.Z for VMS systems.

2.31. Hartstone

Hartstone is a benchmark for measuring various aspects of hard real
time systems from the Software Engineering Institute at Carnegie
Mellon.

You can get this by anonymous ftp to ftp.sei.cmu.edu [128.237.2.179],
in the pub/hartstone directory.

2.32. EuroBen

The main contact for EuroBen is Aad van der Steen.

Name: Aad van der Steen
email: actstea@cc.ruu.nl
address: Academish Computercentrum Utrecht
Budapestlaan 6
3584 CD Utrecht
The Netherlands
phone: +31-30531444
fax: +31-30-531633

2.33. PC Bench/WinBench/NetBench

See http://www.ziff.com/~zdbop

PC Bench 9.0, WinBench 95 Version 1.0, Winstone 95 Version 1.0,
MacBench 2.0, NetBench 3.01, and ServerBench 2.0 are the current names
and versions of the benchmarks available from the Ziff-Davis Benchmark
Operation (ZDBOp).

2.34. Sim

An integer program that compares DNA segments for similarity.

The following files are available from ftp.nosc.mil/pub/aburto:
Source: sim.shar
Result: sim.tbl

--Alfred Aburto

2.35. Fhourstones

Description: Small integer-only program that solves positions
in the game of connect-4 using exhaustive search with a very
large transposition table. Written in C.

Originator: John.Tromp@cwi.nl
Versions: 1.0
Availability of Source:
ftp.nosc.mil:pub/aburto/c4.shar
Availability of Results:
ftp.nosc.mil:pub/aburto/c4.tbl
Entry Last Updated: Mon Oct 11 10:00:00 1993

--John Tromp (tromp@cwi.nl)

2.36. Heapsort

An integer program that uses the "heap sort" method of sorting a
random array of long integers up to 2 megabytes in size.

The following data files are available from ftp.nosc.mil/pub/aburto:
Source: heapsort.c
Result: heapsort.tbl

--Alfred Aburto

2.37. Hanoi

An integer program that solves the Towers of Hanoi puzzle using
recursive function calls.

The following data files are available from ftp.nosc.mil/pub/aburto:
Source: hanoi.c
Result: hanoi.tbl

--Alfred Aburto

2.38. Flops

Estimates MFLOPS rating for specific FADD, FSUB, FMUL, and FDIV
instruction mixes. Four distinct MFLOPS ratings are provided based on
the FDIV weightings from 25% to 0% and using register-to-register
operations. Works with both scalar and vector machines.

The following data files are available from ftp.nosc.mil/pub/aburto:
Source: flops20.c
Result: flops_1.tbl, flops_2.tbl, flops_3.tbl, and flops_4.tbl

--Alfred Aburto

2.39. C LINPACK

The LINPACK floating point program converted to C.

The following data files are available from ftp.nosc.mil/pub/aburto:
Source: clinpack.c
Result: clinpack.dpr, clinpack.dpu, clinpack.spr, and clinpack.spu

--Alfred Aburto

2.40. TFFTDP

This program performs FFT's using the Duhamel-Hollman method for FFT's
from 32 to 262,144 points in size.

The following data files are available from ftp.nosc.mil/pub/aburto:
Source: tfftdp.c
Result: tfftdp.tbl

--Alfred Aburto

2.41. Matrix Multiply (MM)

This program (mm.c) contains 9 different algorithms for doing matrix
multiplication (500 X 500 standard size). Results illustrate the
enormous effects of cache thrashing versus algorithm, machine,
compiler, and compiler options.

The following data files are available from ftp.nosc.mil/pub/aburto:
Source: mm.c
Result: mm_1.tbl, and mm_2.tbl

--Alfred Aburto

2.42. Digital Review

[need info]

2.43. Nullstone

The NULLSTONE Automated Compiler Performance Analysis Tool uses a QA
approach of test coverage and isolation to measure an optimizer. The
performance test suite is comprised of 6,500+ tests covering a wide
range of compiler optimizations. The tool includes a report generator
that generates performance reports, failure reports, regression
reports, and competitive analysis reports. NULLSTONE runs on UNIX,
Win3.1, Win95, WinNT, DOS, and MacOS.

Additional information:
Nullstone Corporaiton
48531 Warm Springs Boulevard, Suite 404
Fremont, CA 94555-7793
Phone: (800) 995-2841 (international (510) 490-6222)
FAX: (510) 490-9333
email: info@nullstone.com
www: http://www.nullstone.com

--Christopher Glaeser

2.44. Rendermark

[need info]

2.45. Bench++

Bench++ is a standard set of C++ benchmarks. More information is
available from http://paul.rutgers.edu/~orost/bench_plus_plus.html.

Source is available from:

http://paul.rutgers.edu/~orost/bench_plus_plus.tar.Z
ftp://paul.rutgers.edu/pub/bench++.tar.Z

2.46. Stream

STREAM is a synthetic benchmark which measures sustainable memory
bandwidth with and without simple arithmetic, based on the timing of
long vector operations. STREAM is available in Fortran and C
versions, and the results are used by all major vendors in high
performance computing.

A discussion of the benchmark and results on some 200 system
configurations are available at:
http://perelandra.cms.udel.edu/hpc/stream/
ftp://perelandra.cms.udel.edu/bench/stream/

Contact: John D. McCalpin, mccalpin@udel.edu,
http://perelandra.cms.udel.edu/~mccalpin

SECTION 3 - Terminology

3.1. Configuration

When interpreting benchmark results, knowing the details of the
configuration of the system that produced the results is critically
important. Such details include, but aren't limited to:

System vendor and model number
CPU vendor, model, revision, and quantity
CPU clock speed
bus architecture, size, and speed
RAM size
primary and secondary cache sizes
disk vendor, model, size, interface
operating system vendor, version, and revision
compiler vendor, version, revision, and options used

3.2. MFLOPS

Millions of Floating Point Operations Per Second. Supposedly the rate
at which the system can execute floating point instructions. Varies
widely between different benchmarks and different configurations of
the same benchmarks. Popular with marketing types because it's sounds
like a "hard" value like miles per hour, and represents a simple
concept.

3.3. MIPS

Millions of Instructions Per Second. Supposedly the rate at which the
system can execute a typical mix of floating point and integer
arithmetic and logical instructions. Unlike MFLOPS, true MIPS rates
are very difficult to measure. Like MFLOPS, MIPS ratings are popular
with marketing people because they sound like "hard" values and
represent a simple, intuitive concept.

Most MIPS ratings quoted these days are based on the assumption that a
Digital Equipment Corporation VAX 11-780 was exactly a 1 MIPS
system. MIPS ratings for other systems were derived by dividing the
Dhrystone rating by the 11-780's rating: 1758. Even if the 11-780
executed an average of one million instructions per second, a MIPS
rating derived by this method would be in terms of VAX instructions,
not the native instruction set of the rated system. Since the VAX is a
Complex Instruction Set Computer (CISC), today's Reduced Instruction
Set Computers (RISCs) need to execute more instructions than the VAX
to do the same amount of work.

3.4. Representative

A benchmark is said to be representative of a user's workload if the
benchmark accurately predicts performance of the user's workload on a
range of configurations. A benchmark that measures general compute
performance won't be very representative for a user with a workload
that's intensively floating point, graphical, memory bandwidth, or I/O
oriented.

When looking for a predictive benchmark, look for one with a workload
as similar as possible to the yours, and see how well it correlates
with the relative performance you've observed with your workload on
various configurations.

3.5. Single Figure of Merit

A single figure of merit is a rating like MIPS or MFLOPS that purports
to rank the performance of systems. E.g., "a 25 MIPS system is faster
than 30 MIPS system". In reality, though, single figures of merit are
often misleading. A "25 MIPS" system may well execute a given workload
faster than a "30 MIPS" system. Single figures of merit are generally
not very representative of actual workloads--unless your workload
consists of nothing by Dhrystone runs.

3.6. KAP

Q: Can someone tell me what "KAP" is?

A: Kuck and Associates, Inc. wrote amongst other things, a compiler
preprocessor called KAP which attempts to do source code optimization.
It is used by all the vendors on the old SPEC89 benchmark because it
considerably improved the performance on the Matrix300 code. It is
also used to do source code parallelization etc. KAP is not an
acronym for Kuck and Associates Preprocessor.

A widespread misconception about KAP is that it is a single
preprocessor that is used on various platforms. It is not. KAP is
tailored to specific processor-compiler pairs, as KAI builds in
knowledge of exactly what the appropriate compiler will do with
particular language constructs and what the resultant machine code
would be. For example, it would need to know how many temporary
variables it could add and have the compiler allocate those to
registers, in addition to knowing how many registers the processor
had. It would need to know about compiler inlining capabilities,
processor cache behavior, BLAS library availability, vector processor
capability, spelling of directives, etc. As you can see, it's not
just a case of "porting" a fixed product to different platforms - each
vendor contracts with KAI to develop a defined set of optimizations in
KAP for the desired platform-compiler pairs. This requires a lot of
work on the part of both KAI and the vendor.

To get the most out of KAP, you need to be intimately aware of what
your application is doing and what types of "aggressive" optimizations
your application can tolerate - some may yield wrong answers, which is
why most compilers don't provide them automatically. Also, KAP can be
incredibly slow itself, though this may be of little import if
"KAPing" is done infrequently.

It is not a "magic bullet". It certainly has its uses, but nothing
can really compensate for a little care and thought put into algorithm
design and coding plus profiling, at least for "real applications".
Yes, KAP makes MATRIX300 look really good. If your application is
MATRIX300, fine. KAP is a good product, and one which, as far as I
know, has no peers in the industry, but it should not be used
automatically or blindly.

For more information about KAP, send email to sales@kai.com, call
217-356-2288 or visit http://www.mai.com.

SECTION 4 - Other Sources of Information

4.1. WORLD WIDE WEB

4.1.1 System Optimization Information

There is now a new www page called System Optimization Information
that contains the latest benchmark comparison lists and the locations
on where to find the actual benchmarking programs. On each list,
there is also info. on how to submit your own results to the list
author. The site also contains overclock (marginal speed testing)
info., comp. related ftp sites, and more.. Check it out.. The
address is http://www.dfw.net/~sdw

sdw@dfw.net (Scott Wainner)
4.2. FTP

Source for various benchmarks is available from sony.com in the
directory 'pub/benchmarks'.

Various benchmarks and results can be obtained via anonymous ftp from
ftp.nosc.mil (128.49.192.51) in directory 'pub/aburto'.

Source and results for router benchmarks is available via anonymous
FTP from hsdndev.harvard.edu (128.103.202.40) in directory 'pub/ndtl'.

The sequential C versions of the NAS CFD (computational fluid dynamics)
benchmarks - appbt and appsp - are now available from the anonymous ftp
site ftp.cs.wisc.edu:/wwt/Misc/NAS.

4.3. COMMERCIAL BENCHMARKS

AIM Technology can be contacted by phone at 1-800-848-UNIX or
1-408-748-8649, or by e-mail to benchinfo@aim.com.

There are a number of commercial performance evaluation tools
available, eg. from Performix (703 448-6606 Fax 703 893-1939) and
Performance Awareness Corporation (919 870-880) that are aimed
at system level X Windows/application benchmarks.

4.4. PUBLICATIONS

The Benchmark Handbook: for database and transaction processing
systems. Edited by Jim Gray. M. Kaufmann Publishers, San Mateo,
Calif., c1991. 334 p., ill. The Morgan Kaufmann series in data
management systems. ISSN 1046-1698

Transaction Processing Performance Council (TPC): TPC-A Standard
Specification, Revision 1.1. Shanley Public Relations, San Jose,
California, March 1992.

Transaction Processing Performance Council (TPC): TPC-C Standard
Specification, Revision 1.0. Shanley Public Relations, San Jose,
California, August 1992.

Complete TPC Results, Performance Evaluation Review, Vol. 19 #2, Aug.
91 (14-23). Shanley Public Relations.

Complete TPC Results, Performance Evaluation Review, Vol. 19 #3, Feb.
93 (32-35). Shanley Public Relations.

Survey of Benchmarks, E. R. Brocklehurst. NPL Report DTITC 192/91,
Nov. 91 (27 pages).

A Benchmark Tutorial, W. Price. IEEE Micro, Oct. 89 (28-43).

"An Overview of Common Benchmarks", Reinhold P. Weicker, IEEE Computer,
vol. 23, no. 12 (Dec. 1990), 65-75
Republished in slightly modified form in

"A detailed look at some popular benchmarks", Reinhold P. Weicker.
Parallel Computing, No. 17 (1991), 1153-1172

"Cache Performance of the SPEC92 Benchark Suite", Jeffrey D. Gee, Mark
D. Hill, Dionisios N. Pnevmatikatos, Alan Jay Smith, p 17-27, IEEE
Micro, August 1993.

"Performance Measurements of the X Window System Communication
Protocol", R. Droms and W. R. Dyksen, Software Practice and
Experience, ISSN 0038-0644 pp S2/119 [this paper covers xlog].

"An Execution Profiler for Window-oriented Applications", Aloke Gupta
and Wen-Mei W. Hwu, to appear in Software Practice & Experience.
hwu@crhc.uiuc.edu [this paper covers xmeasure and xprof].

"Trace Analysis of the X Window System Protocol", Stuart W. Marks &
Laurence P. G. Cable, The X Resource, ISSN 1058-5591, Issue Five,
pp 149 [this covers xtrace and friends].

"X Applications Performance on Client/Server Systems", Ken Oliver,
Xhibition '93 Conference Proceedings, pp 176

High Performance Computing: RISC Architectures, Kevin Dowd,
Optimization, and Benchmarks. O'Reilly & Associates, 1993. ISBN
1-56592-032-5.

4.5 OTHER NETWORK SERVICES

4.5.1 PDS

Xnetlib includes PDS: A Performance Database Server

The process of gathering, archiving, and distributing computer
benchmark data is a cumbersome task usually performed by computer
users and vendors with little coordination. Most important, there
is no publicly-available central depository of performance data
for all ranges of machines from personal computers to supercomputers.

This Xnetlib release contains an Internet-accessible performance
database server (PDS) which can be used to extract current benchmark
data and literature. The current PDS provides an on-line catalog of
the following public-domain computer benchmarks: Linpack Benchmark,
Parallel Linpack Benchmark, Bonnie Benchmark, FLOPS Benchmark, Peak
Performance (part of Linpack Benchmark), Fhourstones and Dhrystones,
Hanoi Benchmark, Heapsort Benchmark, Nsieve Benchmark, Math Benchmark,
Perfect Benchmarks, and Genesis Benchmarks. Rank-ordered lists of
machines per benchmark available as well as relevant papers and
bibiliographies. A browse facility allows the user to extract a
variety of machine/benchmark combinations, and a search feature permits
specific queries into the performance database. PDS does not reformat
or present the benchmark data in any way that conflicts with the
original methodology of any particular benchmark; it is thereby
devoid of any subjective interpretations of machine performance.
PDS is invoked by selecting the "Performance" button in the Xnetlib
Menu Options. Questions and comments for PDS should be mailed to
"utpds@cs.utk.edu."

How to get it -

By WWW from http://netlib.cs.utk.edu/performance/html/PDStop.html

By anonymous ftp from netlib2.cs.utk.edu in xnetlib/xnetlib3.4.shar.Z

By email, send a message to netlib@ornl.gov containing the line:
send xnetlib3.4.shar from xnetlib

Precompiled executables for various platforms are also available.
For information get the index file for the xnetlib library:
via anonymous ftp from netlib2.cs.utk.edu get xnetlib/index
via email send a message to netlib@ornl.gov containing the line:
send index from xnetlib

If you have any questions, please send mail to xnetlib@cs.utk.edu

Last modified: Fri Jan 10 17:01:33 CST 1997 by David Thompson

thomas@cs.wisc.edu