Assistant Professor, Computer Science
We have developed GPU implementations of some of the PARSEC benchmarks in
CUDA. Specifically we have developed GPU implementations for the following
benchmarks: blackcholes, fluidanimate, streamcluster, and swaptions.
It is important to note that these files are provided AS IS, and can be improved
in many aspects. While we performed some performance optimization, there is
more to be done. We do not claim that this is the most optimal
implementation. The code is presented as a representative case of a CUDA
implementation of these workloads only. It is NOT meant to be interpreted as a
definitive answer to how well this application can perform on GPUs or
CUDA. If any of you are interested in improving the performance of this
benchmark, please let us know.
Additionally, it is important to note that this implementation was based on
CUDA SDK 2.3. Future versions of CUDA allow you to implement more C++
features, which may simplify this code or allow other optimizations (in our
paper, we note some of these places).
The benchmarks are being released as of July 13th, 2011. Email the following addresses to
request to download this implementation: email@example.com,
Details can also be found here.
Link to paper.
Please cite this paper if you use our work -- Bibtex.
- (1/24/12) There have been some emails on the PARSEC mailing list about patches
for some of the benchmarks. At least one of these affects the
sequential/pthreads versions of the benchmarks included in our release. I have
not yet had time to update the non-CUDA versions of these programs in the the
tarball we provide, so I wanted to put the information here so you can make the
change(s) in your copy: