David A. Wood
My main research goals lie in developing cost-effective computer
architectures that take advantage of rapidly changing technologies. My
research program has two major thrusts:
- evaluating the performance,
feasibility, and correctness of new architectures, and
- developing new tools and techniques to facilitate this evaluation.
My current work is mostly part of the Wisconsin
Multifacet Project that I co-lead with Mark Hill.
Multifacet proposes to perform research to improve the performance of
the multiprocessor servers that form the computational infrastructure
for Internet web servers, databases, and other demanding applications.
Recent work includes:
Commercial workload and technology trends are pushing existing shared-memory
multiprocessor coherence protocols in divergent directions. Token Coherence
provides a framework for new coherence protocols that can reconcile these
opposing trends by separating performance from correctness. A performance
protocol can optimize for the common case (i.e., absence of races) and rely on the
underlying correctness substrate to resolve races, provide safety, and prevent
starvation. We call the combination Token Coherence, since it explicitly
exchanges and counts tokens to control coherence permissions.
We have developed TokenB, a specific Token Coherence performance protocol
that allows a glueless multiprocessor to both exploit a low-latency unordered
interconnect (like directory protocols) and avoid indirection (like snooping
protocols). Simulations using commercial workloads show that our new protocol
can significantly outperform traditional snooping and directory protocols.
Transmission Line Caches
On-chip interconnect performance presents an increasing barrier to future
high performance systems. The ITRS Roadmap projects that by the end of the
decade, conventional global signals may require tens of cycles to communicate
across a chip. This challenge has inspired wire-centric designs that
use parallelism, locality, and on-chip wiring bandwidth to compensate
for long wire latency.
Transmission Line Caches (TLCs) take a different approach,
on-chip transmission line technology to reduce on-chip interconnect
delay and greatly reduce the level-2 cache access latency.
Compared to conventional RC wires, transmission lines can reduce delay
by up to a factor of 30 for global wires, while eliminating the need
for repeaters. However, this latency reduction comes at the cost of
a comparable reduction in bandwidth.
Our family of Transmission Line Cache
(TLC) designs represent different points in the latency/bandwidth
spectrum. Compared to other proposals, TLCs can reduce area, improve
performance, substantially reduce logical complexity, at the cost
of somewhat greater circuit and manufacturing complexity.
Prior to Multifacet, I worked primarily on the Wisconsin Wind Tunnel Project,
which focused on trade-offs for designing cost-effect parallel machines
supporting shared memory.