WARTS
Wisconsin Architectural Research Tool Set

Glenn Ammons, Tom Ball, Mark Hill, Babak Falsafi, Steve Huss-Lederman, James Larus, Alvin Lebeck, Mike Litzkow, Shubhendu Mukherjee, Steven Reinhardt, Madhusudhan Talluri, and David Wood

Computer Sciences Department
University of Wisconsin
1210 West Dayton St.
Madison, WI 53706
warts@cs.wisc.edu

Wisconsin Architectural Research Tool Set (WARTS) is a collection of tools for profiling and tracing programs, analyzing program traces, and simulating computer architectures. WARTS currently contains:

PP, Program profiling tool.
QPT and QPT2, Program profiling and tracing system.
CPROF, Cache performance profiler.
Tycho and dineroIII, Cache simulators.
EEL, Library for editing executable files.
Fast Cache, Framework for memory system simulators.
WWT2, A fast and portable parallel architecture simulator.

WARTS is distributed with the full source and a small amount of documentation. The tools in WARTS are copyrighted and distributed under license. A copy of the license is available on ftp.cs.wisc.edu in: ~ftp/pub/warts/license.ps or it can be obtained by writing to the address above. WARTS is available without charge for university researchers and is available to other researchers for a modest research donation. Contact warts@cs.wisc.edu for more details.

We also maintain a list of changes and improvements to WARTS programs.

CPROF:

The CPROF system is a cache performance profiler written by Alvin R. Lebeck and David A. Wood that annotates source listings to identify the source lines and data structures that cause frequent cache misses. The CPROF system consists of two programs: Cprof, a uniprocessor cache simulator, and Xcprof, an X windows user interface. Cprof processes program traces generated by QPT (see above) and annotates source lines and data structures with the appropriate cache miss statistics. Xcprof provides a generalized X windows interface for easy viewing of annotated source files.

The performance of current RISC processors is very sensitive to cache miss ratios. Programmers, compiler writers, and language designers must explicitly consider cache behavior to fully exploit a program's performance potential. CPROF provides detailed information about a program's cache behavior through full cache simulation. By annotating lines of source code and data structures with the corresponding number of cache misses, CPROF helps the user focus on problematic data structures and algorithms. CPROF aids the programmer in identifying types of transformations that can improve program cache behavior by classifying cache misses as: compulsory, capacity, or conflict.

Note: This product contains software developed by the University of California, Berkeley and its contributors.

Our experience using CPROF to tune this subset of the SPEC benchmarks is detailed in:

[1] Alvin R. Lebeck and David A. Wood, "Cache Profiling and the SPEC Benchmarks: A Case Study," IEEE Computer, vol. 27, no. 10, Oct. 1994, pp. 15-26.

Tycho and DineroIII:

Tycho and dineroIII are uniprocessor cache simulators written by Mark Hill. The simulators report the behavior of one or more alternative cache designs in response to an input trace provided by the user (e.g., with QPT). A trace is a list of the memory references that a program or workload makes while it is executing. Both simulators are written in C, use the same ASCII trace format, and have been distributed to dozens of companies and universities.

The first simulator, tycho, simultaneously evaluates many alternative uniprocessor caches, but severely restricts the design options that may be varied [1]. Specifically, with one pass through an address trace, tycho will produce a table of miss ratios for caches of many sizes and associativities, provided that all caches have the same block (line) size, do no prefetching, and use LRU replacement. Tycho is used, for example, with the SPEC benchmark suite to examine numerous caches [2]. Tycho is most useful for reducing the size of a large cache design space. A second version of tycho--tychoII--provides higher performance with the option of binary trace input and several other optimizations by Madhusudhan Talluri. TychoII, however, is more complex than tycho and has not been widely used.

The second simulator, dineroIII, evaluates only one uniprocessor cache at a time, but produces more performance metrics (e.g., traffic to and from memory) and allows more cache design options to be varied (e.g., write-back vs. write-through, LRU vs. random replacement, demand fetching vs. prefetching). DineroIII is distributed for instructional use with a popular graduate computer architecture textbook [3]. DineroIII is most useful for carefully studying a few alternative cache designs.

[1] Mark D. Hill and Alan Jay Smith, "Evaluating Associativity in CPU Caches," IEEE Trans. on Computers, C-38, 12, December 1989, p.1612-1630.

[2] Jeffrey D. Gee, Mark D. Hill, Dionisios N. Pnevmatikatos, Alan Jay Smith, "Cache Performance of the SPEC Benchmark Suite," to appear, IEEE Micro, August 1993, 3, 2.

[3] John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, California, 1990

Last modified: March 1, 1999 by James Larus

larus@cs.wisc.edu

WARTS Wisconsin Architectural Research Tool Set

CPROF:

Tycho and DineroIII:

WARTS
Wisconsin Architectural Research Tool Set