Computer Sciences Dept.

Dan Gibson


Note: I have graduated and moved on. This page is no longer up-to-date.
Dan Gibson on Google Sites.

Updated 7 JUNE 2010.


Forwardflow: A Scalable Core for Power-Constrained CMPs

ISCA 2010
PPT    Chip Multiprocessors (CMPs) are now commodity hardware, but commoditization of parallel software remains elusive. In the near term, the current trend of increased coreper- socket count will continue, despite a lack of parallel software to exercise the hardware. Future CMPs must deliver thread-level parallelism when software provides threads to run, but must also continue to deliver performance gains for single threads by exploiting instructionlevel parallelism and memory-level parallelism. However, power limitations will prevent conventional cores from exploiting both simultaneously.
This work presents the Forwardflow Architecture, which can scale its execution logic up to run single threads, or down to run multiple threads in a CMP. Forwardflow dynamically builds an explicit internal dataflow representation from a conventional instruction set architecture, using forward dependence pointers to guide instruction wakeup, selection, and issue. Forwardflow's backend is organized into discrete units that can be individually (de-)activated, allowing each core's performance to be scaled by system software at the architectural level.
On single threads, Forwardflow core scaling yields a mean runtime reduction of 21% for a 37% increase in power consumption. For multithreaded workloads, a Forwardflow-based CMP allows system software to select the performance point that best matches available power.

Diamonds are a Memory Controller's Best Friend

Adapted from a talk at
ISCA 2009
PPT    In the near term, Moore's law will continue to provide an increasing number of transistors and therefore an increasing number of on-chip cores. Limited pin bandwidth prevents the integration of a large number of memory controllers on-chip. With many cores, and few memory controllers, where to locate the memory controllers in the on-chip interconnection fabric becomes an important and as yet unexplored question. In this paper we show how the location of the memory controllers can reduce contention (hot spots) in the on-chip fabric and lower the variance in reference latency. This in turn provides predictable performance for memory-intensive applications regardless of the processing core on which a thread is scheduled. We explore the design space of on-chip fabrics to find optimal memory controller placement relative to different topologies (i.e. mesh and torus), routing algorithms, and workloads.
Note: Dennis Abts gave the ISCA 2009 talk. This talk is the version I give, when asked.

Introduction to OpenMP

Adapted from a talk for
CS 838: Pervasive Parellelism
PPT    A brief and gentile introduction to the syntax and programming style of OpenMP. Originally intended to introduce students to the framework, adapted to address the curiosity of a more general audience.

Best viewed on Google Chrome. Get it here.

Computer Sciences | UW Home