Subject to change, especially after the first week or two.
Background
John L. Hennessy and David A. Patterson,
Computer Architecture: A Quantitative Approach,
Morgan Kaufmann Publishers, Third Edition, 2002.
From Chapter 6 Multiprocessors and Thread-Level Parallelism, pp. 528-664:
- Introduction (read now)
- Characteristics of Application Domains (read now)
- Symmetric Shared-Memory Architectures (read now)
- Performance of Symmetric Shared-Memory Multiprocessors (skim now; read later)
- Distributed Shared-Memory Architectures (read now)
- Performance of Distributed Shared-Memory Architectures (skim now; read later)
- Synchronization (read now)
- Models of Memory Consistency: An Introduction (optional)
- Multithreading: Exploiting Thread-Level Parallelism within a Processor (read now)
- Crosscutting Issues (optional)
- Putting It All Together: Sun's Wildfire Prototype (optional)
- Another View: Multithreading in a Commercial Server (optional)
- Another View: Embedded Multiprocessors (optional)
- Fallacies and Pitfalls (read now)
- Concluding Remarks (read now)
- Historical Perspective and References (skim now; read later)
Introduction
Herb Sutter,
The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software,
Dr. Dobb's Journal, 30(3), March 2005.
Html.
Herb Sutter and James Larus,
Software and the Concurrency Problem,
ACM Queue,
September 2005,
Online PDF for University of Wisconsin only.
Thomas H. Cormen,
Charles E. Leiserson,
Ronald L. Rivest,
Clifford Stein,
Introduction to Algorithms, 3rd Edition;
Chapter 8: Multithreaded Algorithms,
http://www.multicoreinfo.com/research/papers/whitepapers/MT-Algorithms-Chapter.pdf
Review
Maurice Herlihy and Nir Shavit,
The Art of Multiprocessor Programming,
Morgan Kaufmann, 2008.
Reference.
Introduction to Parallel Computing
LLNL Web Site
(html).
Reference.
Multicore processors
Poonacha Kongetira, Kathirgamar Aingaran,
Kunle Olukotun,
Niagara: A 32-Way Multithreaded Sparc Processor,
IEEE Micro,
March-April 2005, pp. 21-29.
Online PDF for University of Wisconsin only.
Review.
Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu,
Mike Chen, and Kunle Olukotun,
The Stanford Hydra CMP,
IEEE Micro,
March-April 2000, pp. 71-84.
Online PDF for University of Wisconsin only.
Reference.
Luiz Andre Barroso, et al.,
Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing,
Proc. International Symposium on Computer Architecture,
June 2000, pp. 282-293.
Online PDF for University of Wisconsin only.
Reference.
David A. Wood and Mark D. Hill,
Cost-Effective Parallel Computing,
IEEE Computer,
February 1995.
IEEE Explore PDF.
Pthreads
POSIX Threads Programming,
Web Site
(html).
Reference.
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh,
and Anoop Gupta,
The SPLASH-2 Programs: Characterization and Methodological Considerations,
Proc. International Symposium on Computer Architecture,
June 1995.
Online PDF for University of Wisconsin only.
Review.
Edward A. Lee, "The Problem with Threads," Computer, vol. 39, no. 5, pp. 33-42, May 2006, doi:10.1109/MC.2006.180
IEEE Explore PDF
Read.
Atul Adya, Jon Howell, Marvin Theimer, Bill Bolosky, John Douceur,
Cooperative Task Management without Manual Stack Management, or
Event-driven Programming is not the Opposite of Threaded Programming,
Proc. USENIX,
June 2002.
Online PDF for University of Wisconsin only.
Reference.
Locking and Threads
McKenney, P. E. 1996. Selecting locking primitives for parallel programming. Commun. ACM 39, 10 (Oct. 1996), 75-82. DOI= http://doi.acm.org/10.1145/236156.236174
PDF.
Review.
Paul E. Mckenney,
Selecting Locking Designs for Parallel Programs
In Pattern Languages of Program Design, Vol.2, (J.Vlissides, J.Coplien and N.Kerth eds (1996)
PDFReference
Beng-Hong Lim and Anant Agarwal,
Reactive synchronization algorithms for multiprocessors,
In ASPLOS 1994. PDF
Reference
Bayer, R. and M. Schkolnick, Concurrency
of Operations on B-Trees, Readings in Database Systems, pp. 127-139.
Review
Gray, J., et. al., "Granularity of Locks and
Degrees of Consistency in a Shared Database," Readings in Database
Systems, pp. 175-193
Review
OpenMP
Leonardo Dagum and Ramesh Menon,
OpenMP: An Industry Standard API for Shared Memory Programming
IEEE Computational Science and Engineering, Jan-Mar, 1998.
Online PDF for University of Wisconsin only.
LLNL OpenMP Tutorial,
Web Site
(html).
OpenMP: Simple, Portable, Scalable SMP Programming,
Web Site
(html). Reference.
Cilk/TBB
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Multithreaded Algorithms. Introduction to Algorithms. Third Edition. MIT Press, 2010.
Online PDF for University of Wisconsin only.
Review.
Don Dailey and Charles E. Leiserson
Using Cilk to Write Multiprocessor Chess Programs
The Journal of the International Computer Chess Association, 2002.
PDF. Reference
Matteo Frigo, Charles E. Leiserson, Keith H. Randall,
The Implementation of the Cilk-5 Multithreaded Language,
Proceedings of the 1998 conference on Programming Language Design and Implementation (PLDI), pp. 212-223.
PDF
Review
TBB Tutorial
Intel Web Document.
Online PDF for University of Wisconsin only.
Serialization Sets
Allen, M. D., Sridharan, S., and Sohi, G. S. 2010. Serialization sets: a dynamic dependence-based parallel execution model. Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2010.
PDF Review
MapReduce
Luiz Andre Barroso, Jeffrey Dean, Urs Holzle,
Web Search For a Planet: The Google Cluster Architecture,
IEEE Micro,
23(2):22-28, March-April 2003.
Online PDF for University of Wisconsin only. Reference.
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis,
Evaulating MapReduce for Multi-core and Multiprocessor Systems,
Proceedings of the 13th Intl. Symposium on High-Performance Computer Architecture (HPCA), February 2007. (PDF) Review
GPGPUs
TESLA: A Unified Graphics and Computing Architecture
Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym, “NVIDIA TESLA: A Unified Graphics and Computing Architecture”,
IEEE Micro Volume 28, Issue 2, Date: March-April 2008, Pages: 39-55.
(PDF) Read
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt, Analyzing CUDA Workloads Using a Detailed GPU Simulator, In proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 163-174, Boston, MA, April 26-28, 2009.
(PDF) Review
Transactional Memory
Unlocking Concurrency
Ali-Reza Adl-Tabatabai, Christos Kozyrakis, Bratin Saha
December 2006
ACM Queue, Volume 4 Issue 10
Review
Subtleties of Transactional Memory Atomicity Semantics.
Colin Blundell, E Christopher Lewis, and Milo Martin.
Computer Architecture Letters (CAL '06), Volume 5, Number 2, November 2006.
Review
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory
Mehrara, M., Hao, J., Hsu, P., and Mahlke, S. 2010. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (Dublin, Ireland, June 15 - 21, 2010). PLDI '09.
Review
McRT-STM: a high performance software transactional memory system for a multi-core runtime.
Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, Ben Hertzberg,
PPOPP 2006: 187-197
Transactional Memory Coherence and Consistency
Lance Hammond, Vicky Wong, Mike Chen, Ben Hertzberg, Brian D. Carlstrom, John D. Davis, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun
Proceedings of the 31st Annual International Symposium on Computer Architecture, München, Germany, June 19-23, 2004.
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan,
Mark D. Hill, and David A. Wood,
LogTM: Log-based Transactional Memory,
Submitted to Proc. Symposium on High-Performance Computer Architecture,
February 2006.
Online PDF.
Peter Damron, Alexandra Fedorova, Yossi Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum,
Hybrid transactional memory,
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS),
2006.
Online PDF.
Transactional Memory Online,
Web Site
(html). Reference.
Maurice Herlihy and J. Eliot B. Moss,
Transactional Memory: Architectural Support for Lock-Free Data Structures,
Proc. International Symposium on Computer Architecture,
May 1993.
Online PDF for University of Wisconsin only. Reference Only
Albert Chang and Mark F. Mergen,
801 Storage: Architecture and Programming,
ACM Trans. on Computer Systems,
February 1988.
Online PDF for University of Wisconsin only.
Concentrate on transactional issues (e.g., Section 3.3).
Reference Only
Ravi Rajwar and James R. Goodman,
Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution,
Proc. 34th Intl. Symposium on Microarchitecture,
December 2001.
Online PDF for University of Wisconsin only.
Reference Only
Ravi Rajwar and James R. Goodman,
Transactional Lock-Free Execution of Lock-Based Programs,
Proc. Architectural Support for Programming Languages and Operating
Systems,
October 2002.
Online PDF for University of Wisconsin only.
Reference Only
Virendra J. Marathe and Michael L. Scott,
A Qualitative Survey of Modern Software Transactional Memory Systems,
University of Rochester Computer Science Department TR 839,
June 2004.
Online PDF for University of Wisconsin only.
Reference Only
C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson,
and Sean Lie,
Unbounded Transactional Memory,
Proc. Symposium on High-Performance Computer Architecture,
February 2005.
Online PDF for University of Wisconsin only.
Reference Only
Ravi Rajwar, Maurice Herlihy, and Konrad Lai,
Virtualizing Transactional Memory,
Proc. International Symposium on Computer Architecture,
June 2005.
Online PDF for University of Wisconsin only.
Reference Only
Maurice Herlihy, Victor Luchangco, and Mark Moir,
Obstruction-Free Synchronization: Double-Ended Queues as an Example
Proc. International Conference on Distributed Computing Systems,
Online PDF for University of Wisconsin only.
Reference Only
Tim Harris,
Design Choices for Language-based Transactions,
Microsoft Techical Report UCAM-CL-TR-572,
August 2003.
PDF.
Reference Only
Carlstrom et al.,
Transactional Execution of Java Programs,
OOPSLA Workshop on Synchronization and Concurrency in
Object-Oriented Languages (SCOOL),
October 2005.
PDF.
Reference Only
Students Present
As students present papers, these paper will be assigned for reading.
"William Thies, Michal Karczmarek, and Saman Amarasinghe. StreamIt: A Language for Streaming Applications. In Proceedings of the 2002 International Conference on Compiler Construction."
Online PDF for University of Wisconsin only.
Herlihy, M. A Methodology for Implementing Highly Concurrent Data Structures. PPOPP '90.
PDF
N. Hardavellas, I. Pandis, R. Johnson, N. G. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: Limitations and opportunities. In 3rd Biennial Conference on Innovative Data Systems Research (CIDR), 2010.
Online PDF for University of Wisconsin only.
David Luebke and Greg Humphreys,
"How GPUs Work":
PDF
Adl-Tabatabai, A., et al., Compiler and Runtime Support for Efficient Software Transactional Memory, PLDI 2006, pp. 26-37.
PDF
Other interesting papers
Maurice Herlihy and Nir Shavit,
The Art of Multiprocessor Programming,
Morgan Kaufmann, 2008.
John M. Mellor-Crummey and Michael L. Scott,
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
ACM Trans. on Computer Systems.
February 1991, pp. 21-65.
Online PDF for University of Wisconsin only.
Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, Henry M. Levy,
Scheduler Activations:
Effective Kernel Support for the User-level Management of Parallelism,
Proc. Symposium on Operating System Principles,
October 1991.
Online PDF for University of Wisconsin only.