Note: I have graduated and moved on. This page is no longer up-to-date.
Dan Gibson on Google Sites.
Updated 7 JUNE 2010.
|Forwardflow: A Scalable Core for Power-Constrained CMPs
Chip Multiprocessors (CMPs) are now commodity hardware,
but commoditization of parallel software remains
elusive. In the near term, the current trend of increased coreper-
socket count will continue, despite a lack of parallel
software to exercise the hardware. Future CMPs must
deliver thread-level parallelism when software provides
threads to run, but must also continue to deliver performance
gains for single threads by exploiting instructionlevel
parallelism and memory-level parallelism. However,
power limitations will prevent conventional cores from
exploiting both simultaneously.
This work presents the Forwardflow Architecture, which
can scale its execution logic up to run single threads, or
down to run multiple threads in a CMP. Forwardflow
dynamically builds an explicit internal dataflow representation
from a conventional instruction set architecture, using
forward dependence pointers to guide instruction wakeup,
selection, and issue. Forwardflow's backend is organized
into discrete units that can be individually (de-)activated,
allowing each core's performance to be scaled by system
software at the architectural level.
On single threads, Forwardflow core scaling yields a mean
runtime reduction of 21% for a 37% increase in power consumption.
For multithreaded workloads, a Forwardflow-based
CMP allows system software to select the performance
point that best matches available power.
|Diamonds are a Memory Controller's Best Friend
Adapted from a talk at
In the near term, Moore's law will continue to provide an
increasing number of transistors and therefore an increasing
number of on-chip cores. Limited pin bandwidth prevents
the integration of a large number of memory controllers on-chip.
With many cores, and few memory controllers, where
to locate the memory controllers in the on-chip interconnection
fabric becomes an important and as yet unexplored
question. In this paper we show how the location of the
memory controllers can reduce contention (hot spots) in the
on-chip fabric and lower the variance in reference latency.
This in turn provides predictable performance for memory-intensive
applications regardless of the processing core on
which a thread is scheduled. We explore the design space of
on-chip fabrics to find optimal memory controller placement
relative to different topologies (i.e. mesh and torus), routing
algorithms, and workloads.
Note: Dennis Abts gave the ISCA 2009 talk. This talk is the version I
give, when asked.
|Introduction to OpenMP
Adapted from a talk for
CS 838: Pervasive Parellelism
A brief and gentile introduction to the syntax and programming style of OpenMP. Originally intended
to introduce students to the framework, adapted to address the curiosity of a more general audience.
Best viewed on Google Chrome. Get it here.