Computer Sciences Department
University of Wisconsin
1210 West Dayton St.
Madison, WI 53706
warts@cs.wisc.edu
Wisconsin Architectural Research Tool Set (WARTS) is a collection of tools for profiling and tracing programs, analyzing program traces, and simulating computer architectures. WARTS currently contains:
WARTS is distributed with the full source and a small amount of documentation. The tools in WARTS are copyrighted and distributed under license. A copy of the license is available on ftp.cs.wisc.edu in: ~ftp/pub/warts/license.ps or it can be obtained by writing to the address above. WARTS is available without charge for university researchers and is available to other researchers for a modest research donation. Contact warts@cs.wisc.edu for more details.
We also maintain a list of changes and improvements to WARTS programs.
The performance of current RISC processors is very sensitive to cache miss ratios. Programmers, compiler writers, and language designers must explicitly consider cache behavior to fully exploit a program's performance potential. CPROF provides detailed information about a program's cache behavior through full cache simulation. By annotating lines of source code and data structures with the corresponding number of cache misses, CPROF helps the user focus on problematic data structures and algorithms. CPROF aids the programmer in identifying types of transformations that can improve program cache behavior by classifying cache misses as: compulsory, capacity, or conflict.
Note: This product contains software developed by the University of California, Berkeley and its contributors.
Our experience using CPROF to tune this subset of the SPEC benchmarks is detailed in:
[1] Alvin R. Lebeck and David A. Wood, "Cache Profiling and the SPEC Benchmarks: A Case Study," IEEE Computer, vol. 27, no. 10, Oct. 1994, pp. 15-26.
The first simulator, tycho, simultaneously evaluates many alternative uniprocessor caches, but severely restricts the design options that may be varied [1]. Specifically, with one pass through an address trace, tycho will produce a table of miss ratios for caches of many sizes and associativities, provided that all caches have the same block (line) size, do no prefetching, and use LRU replacement. Tycho is used, for example, with the SPEC benchmark suite to examine numerous caches [2]. Tycho is most useful for reducing the size of a large cache design space. A second version of tycho--tychoII--provides higher performance with the option of binary trace input and several other optimizations by Madhusudhan Talluri. TychoII, however, is more complex than tycho and has not been widely used.
The second simulator, dineroIII, evaluates only one uniprocessor cache at a time, but produces more performance metrics (e.g., traffic to and from memory) and allows more cache design options to be varied (e.g., write-back vs. write-through, LRU vs. random replacement, demand fetching vs. prefetching). DineroIII is distributed for instructional use with a popular graduate computer architecture textbook [3]. DineroIII is most useful for carefully studying a few alternative cache designs.
[1] Mark D. Hill and Alan Jay Smith, "Evaluating Associativity in CPU Caches," IEEE Trans. on Computers, C-38, 12, December 1989, p.1612-1630.
[2] Jeffrey D. Gee, Mark D. Hill, Dionisios N. Pnevmatikatos, Alan Jay Smith, "Cache Performance of the SPEC Benchmark Suite," to appear, IEEE Micro, August 1993, 3, 2.
[3] John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann, San Mateo, California, 1990