(Last Updated February 2005)
The mail research goal of the Wisconsin Multifacet Project is to improve the performance (as well as designability, programmability and reliability) of the multiprocessor servers that form the computational infrastructure for the Internet. Within this goal, most of my research focused on cache compression for chip multiprocessors, performance evaluation of multi-threaded workloads, and tuning commercial workloads for computer architecture evaluation.
- Adaptive Cache Compression:
Due to the increasing gap between processor and memory speeds, designing an effective on-chip cache has been the focus of architects for a long time. Cache compression has the potential to increase the effective cache size for a fixed chip area. A larger cache improves the performance of many applications by avoiding some off-chip cache misses. Unfortunately, the overhead of decompression increases cache access latency, potentially degrading performance for some applications. An important question is whether the compression benefit outweighs its cost. We developed an adaptive policy that uses prediction to dynamically adapt to the costs and benefits of cache compression. This prediction mechanism requires only a few extra bits per cache set and a single global saturating hardware counter. By dynamically monitoring application behavior, the adaptive policy achieves almost all the benefits of compression, while not degrading performance for applications that would be hurt by compression.
- Workload Variability: Computer architects use simulation as a primary tool to evaluate their designs based on the performance of important applications. For simulation experiments to be valid, simulations starting from the same initial conditions have to be repeatable (i.e., produce the same result every time). However, commercial, multi-threaded workloads running on a real system are typically not repeatable; and run times of a single workload can vary widely. This variability comes in two types: time variability (also known as phase behavior) and space variability. Space variability occurs when small variations in timing (e.g., OS scheduling decisions) cause the space of runs starting from the same initial conditions to follow widely different execution paths. Variability can be wider than the performance improvement margin in many architectural simulation experiments, rendering simulation results meaningless. We showed that variability, if not addressed, can lead to incorrect conclusions from simulation in many cases. We proposed a methodology based on timing perturbations, multiple simulations and standard statistical techniques to compensate for variability. This methodology greatly reduces the probability of reaching incorrect conclusions while enabling simulations to finish within reasonable time limits.
- Tuning Commercial Workloads: As many current and future computer systems are designed specifically to run commercial applications, computer architects need to simulate such applications to evaluate their proposed designs. However, commercial applications typically run on much larger systems than those available for simulation. Computer architects need to have a representative, scaled-down approximation of commercial applications that can be simulated in a reasonable period of time. I was responsible for scaling down and tuning an on-line transaction processing benchmark that is derived from the TPC-C benchmark. This process involved setup on a real machine, scaling the benchmark down, tuning its parameters, and importing the benchmark into a full system simulation environment.