UW-Madison
Computer Sciences Dept.

CS 758 Advanced Topics in Computer Architecture

Programming Current and Future Multicore Processors

Fall 2010 Section 1
Instructor David A. Wood and T. A. Derek Hower
URL: http://www.cs.wisc.edu/~david/courses/cs758/Fall2010/

Reading List

Subject to change, especially after the first week or two.


Background

John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, Third Edition, 2002. From Chapter 6 Multiprocessors and Thread-Level Parallelism, pp. 528-664:

  1. Introduction (read now)
  2. Characteristics of Application Domains (read now)
  3. Symmetric Shared-Memory Architectures (read now)
  4. Performance of Symmetric Shared-Memory Multiprocessors (skim now; read later)
  5. Distributed Shared-Memory Architectures (read now)
  6. Performance of Distributed Shared-Memory Architectures (skim now; read later)
  7. Synchronization (read now)
  8. Models of Memory Consistency: An Introduction (optional)
  9. Multithreading: Exploiting Thread-Level Parallelism within a Processor (read now)
  10. Crosscutting Issues (optional)
  11. Putting It All Together: Sun's Wildfire Prototype (optional)
  12. Another View: Multithreading in a Commercial Server (optional)
  13. Another View: Embedded Multiprocessors (optional)
  14. Fallacies and Pitfalls (read now)
  15. Concluding Remarks (read now)
  16. Historical Perspective and References (skim now; read later)

Introduction

Herb Sutter, The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software, Dr. Dobb's Journal, 30(3), March 2005. Html.

Herb Sutter and James Larus, Software and the Concurrency Problem, ACM Queue, September 2005, Online PDF for University of Wisconsin only.

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein,
Introduction to Algorithms, 3rd Edition; Chapter 8: Multithreaded Algorithms,
http://www.multicoreinfo.com/research/papers/whitepapers/MT-Algorithms-Chapter.pdf Review

Maurice Herlihy and Nir Shavit, The Art of Multiprocessor Programming, Morgan Kaufmann, 2008. Reference.

Introduction to Parallel Computing LLNL Web Site (html). Reference.


Multicore processors

Poonacha Kongetira, Kathirgamar Aingaran, Kunle Olukotun, Niagara: A 32-Way Multithreaded Sparc Processor, IEEE Micro, March-April 2005, pp. 21-29. Online PDF for University of Wisconsin only. Review.

Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Mike Chen, and Kunle Olukotun, The Stanford Hydra CMP, IEEE Micro, March-April 2000, pp. 71-84. Online PDF for University of Wisconsin only. Reference.

Luiz Andre Barroso, et al., Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing, Proc. International Symposium on Computer Architecture, June 2000, pp. 282-293. Online PDF for University of Wisconsin only. Reference.

David A. Wood and Mark D. Hill, Cost-Effective Parallel Computing, IEEE Computer, February 1995. IEEE Explore PDF.


Pthreads

POSIX Threads Programming, Web Site (html). Reference.

Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta, The SPLASH-2 Programs: Characterization and Methodological Considerations, Proc. International Symposium on Computer Architecture, June 1995. Online PDF for University of Wisconsin only. Review.

Edward A. Lee, "The Problem with Threads," Computer, vol. 39, no. 5, pp. 33-42, May 2006, doi:10.1109/MC.2006.180 IEEE Explore PDF Read.

Atul Adya, Jon Howell, Marvin Theimer, Bill Bolosky, John Douceur, Cooperative Task Management without Manual Stack Management, or Event-driven Programming is not the Opposite of Threaded Programming, Proc. USENIX, June 2002. Online PDF for University of Wisconsin only. Reference.


Locking and Threads

McKenney, P. E. 1996. Selecting locking primitives for parallel programming. Commun. ACM 39, 10 (Oct. 1996), 75-82. DOI= http://doi.acm.org/10.1145/236156.236174 PDF. Review.

Paul E. Mckenney, Selecting Locking Designs for Parallel Programs In Pattern Languages of Program Design, Vol.2, (J.Vlissides, J.Coplien and N.Kerth eds (1996) PDFReference

Beng-Hong Lim and Anant Agarwal, Reactive synchronization algorithms for multiprocessors, In ASPLOS 1994. PDF Reference

Bayer, R. and M. Schkolnick, Concurrency of Operations on B-Trees, Readings in Database Systems, pp. 127-139. Review

Gray, J., et. al., "Granularity of Locks and Degrees of Consistency in a Shared Database," Readings in Database Systems, pp. 175-193   Review


OpenMP

Leonardo Dagum and Ramesh Menon, OpenMP: An Industry Standard API for Shared Memory Programming IEEE Computational Science and Engineering, Jan-Mar, 1998. Online PDF for University of Wisconsin only.

LLNL OpenMP Tutorial, Web Site (html).

OpenMP: Simple, Portable, Scalable SMP Programming, Web Site (html). Reference.


Cilk/TBB

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Multithreaded Algorithms. Introduction to Algorithms. Third Edition. MIT Press, 2010. Online PDF for University of Wisconsin only. Review.

Don Dailey and Charles E. Leiserson
Using Cilk to Write Multiprocessor Chess Programs The Journal of the International Computer Chess Association, 2002. PDF. Reference

Matteo Frigo, Charles E. Leiserson, Keith H. Randall,
The Implementation of the Cilk-5 Multithreaded Language, Proceedings of the 1998 conference on Programming Language Design and Implementation (PLDI), pp. 212-223. PDF Review

TBB Tutorial Intel Web Document. Online PDF for University of Wisconsin only.


Serialization Sets

Allen, M. D., Sridharan, S., and Sohi, G. S. 2010. Serialization sets: a dynamic dependence-based parallel execution model. Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2010. PDF Review


MapReduce

Luiz Andre Barroso, Jeffrey Dean, Urs Holzle, Web Search For a Planet: The Google Cluster Architecture, IEEE Micro, 23(2):22-28, March-April 2003. Online PDF for University of Wisconsin only. Reference.

Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis, Evaulating MapReduce for Multi-core and Multiprocessor Systems, Proceedings of the 13th Intl. Symposium on High-Performance Computer Architecture (HPCA), February 2007. (PDF) Review


GPGPUs

TESLA: A Unified Graphics and Computing Architecture Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym, “NVIDIA TESLA: A Unified Graphics and Computing Architecture”, IEEE Micro Volume 28, Issue 2, Date: March-April 2008, Pages: 39-55. (PDF) Read

Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt, Analyzing CUDA Workloads Using a Detailed GPU Simulator, In proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 163-174, Boston, MA, April 26-28, 2009. (PDF) Review


Transactional Memory

Unlocking Concurrency Ali-Reza Adl-Tabatabai, Christos Kozyrakis, Bratin Saha December 2006 ACM Queue, Volume 4 Issue 10 Review

Subtleties of Transactional Memory Atomicity Semantics. Colin Blundell, E Christopher Lewis, and Milo Martin. Computer Architecture Letters (CAL '06), Volume 5, Number 2, November 2006. Review

Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory Mehrara, M., Hao, J., Hsu, P., and Mahlke, S. 2010. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (Dublin, Ireland, June 15 - 21, 2010). PLDI '09. Review

McRT-STM: a high performance software transactional memory system for a multi-core runtime. Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, Ben Hertzberg, PPOPP 2006: 187-197

Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Ben Hertzberg, Brian D. Carlstrom, John D. Davis, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun Proceedings of the 31st Annual International Symposium on Computer Architecture, München, Germany, June 19-23, 2004.

Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood, LogTM: Log-based Transactional Memory, Submitted to Proc. Symposium on High-Performance Computer Architecture, February 2006. Online PDF.

Peter Damron, Alexandra Fedorova, Yossi Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum, Hybrid transactional memory, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS), 2006. Online PDF.

Transactional Memory Online, Web Site (html). Reference.

Maurice Herlihy and J. Eliot B. Moss, Transactional Memory: Architectural Support for Lock-Free Data Structures, Proc. International Symposium on Computer Architecture, May 1993. Online PDF for University of Wisconsin only. Reference Only

Albert Chang and Mark F. Mergen, 801 Storage: Architecture and Programming, ACM Trans. on Computer Systems, February 1988. Online PDF for University of Wisconsin only. Concentrate on transactional issues (e.g., Section 3.3). Reference Only

Ravi Rajwar and James R. Goodman, Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution, Proc. 34th Intl. Symposium on Microarchitecture, December 2001. Online PDF for University of Wisconsin only. Reference Only

Ravi Rajwar and James R. Goodman, Transactional Lock-Free Execution of Lock-Based Programs, Proc. Architectural Support for Programming Languages and Operating Systems, October 2002. Online PDF for University of Wisconsin only. Reference Only

Virendra J. Marathe and Michael L. Scott, A Qualitative Survey of Modern Software Transactional Memory Systems, University of Rochester Computer Science Department TR 839, June 2004. Online PDF for University of Wisconsin only. Reference Only

C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson, and Sean Lie, Unbounded Transactional Memory, Proc. Symposium on High-Performance Computer Architecture, February 2005. Online PDF for University of Wisconsin only. Reference Only

Ravi Rajwar, Maurice Herlihy, and Konrad Lai, Virtualizing Transactional Memory, Proc. International Symposium on Computer Architecture, June 2005. Online PDF for University of Wisconsin only. Reference Only

Maurice Herlihy, Victor Luchangco, and Mark Moir, Obstruction-Free Synchronization: Double-Ended Queues as an Example Proc. International Conference on Distributed Computing Systems, Online PDF for University of Wisconsin only. Reference Only

Tim Harris, Design Choices for Language-based Transactions, Microsoft Techical Report UCAM-CL-TR-572, August 2003. PDF. Reference Only

Carlstrom et al., Transactional Execution of Java Programs, OOPSLA Workshop on Synchronization and Concurrency in Object-Oriented Languages (SCOOL), October 2005. PDF. Reference Only


Students Present

As students present papers, these paper will be assigned for reading.

"William Thies, Michal Karczmarek, and Saman Amarasinghe. StreamIt: A Language for Streaming Applications. In Proceedings of the 2002 International Conference on Compiler Construction." Online PDF for University of Wisconsin only.

Herlihy, M. A Methodology for Implementing Highly Concurrent Data Structures. PPOPP '90. PDF

N. Hardavellas, I. Pandis, R. Johnson, N. G. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: Limitations and opportunities. In 3rd Biennial Conference on Innovative Data Systems Research (CIDR), 2010. Online PDF for University of Wisconsin only.

David Luebke and Greg Humphreys, "How GPUs Work": PDF

Adl-Tabatabai, A., et al., Compiler and Runtime Support for Efficient Software Transactional Memory, PLDI 2006, pp. 26-37. PDF

Other interesting papers

Maurice Herlihy and Nir Shavit, The Art of Multiprocessor Programming, Morgan Kaufmann, 2008.

John M. Mellor-Crummey and Michael L. Scott, Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors ACM Trans. on Computer Systems. February 1991, pp. 21-65. Online PDF for University of Wisconsin only.

Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, Henry M. Levy, Scheduler Activations: Effective Kernel Support for the User-level Management of Parallelism, Proc. Symposium on Operating System Principles, October 1991. Online PDF for University of Wisconsin only.

 
Computer Sciences | UW Home

  Computer Sciences | UW Home