UW-Madison
Computer Sciences Dept.

CS 757 Computer Architecture II Spring 2011 Section 1
Instructor David A. Wood
URL: http://www.cs.wisc.edu/~david/courses/cs757/Spring2011/

Supplemental Reading List

The assigned readings will be posted on the schedule page. This list includes readings from earlier offerings of CS757 and CS758, which you may find useful.


Background

John L. Hennessy and David A. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, Third Edition, 2002. From Chapter 6 Multiprocessors and Thread-Level Parallelism, pp. 528-664:

  1. Introduction (read now)
  2. Characteristics of Application Domains (read now)
  3. Symmetric Shared-Memory Architectures (read now)
  4. Performance of Symmetric Shared-Memory Multiprocessors (skim now; read later)
  5. Distributed Shared-Memory Architectures (read now)
  6. Performance of Distributed Shared-Memory Architectures (skim now; read later)
  7. Synchronization (read now)
  8. Models of Memory Consistency: An Introduction (optional)
  9. Multithreading: Exploiting Thread-Level Parallelism within a Processor (read now)
  10. Crosscutting Issues (optional)
  11. Putting It All Together: Sun's Wildfire Prototype (optional)
  12. Another View: Multithreading in a Commercial Server (optional)
  13. Another View: Embedded Multiprocessors (optional)
  14. Springacies and Pitfalls (read now)
  15. Concluding Remarks (read now)
  16. Historical Perspective and References (skim now; read later)

Introduction

Herb Sutter, The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software, Dr. Dobb's Journal, 30(3), March 2005. Html.

Herb Sutter and James Larus, Software and the Concurrency Problem, ACM Queue, September 2005, Online PDF for University of Wisconsin only.

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein,
Introduction to Algorithms, 3rd Edition; Chapter 8: Multithreaded Algorithms,
http://www.multicoreinfo.com/research/papers/whitepapers/MT-Algorithms-Chapter.pdf

Maurice Herlihy and Nir Shavit, The Art of Multiprocessor Programming, Morgan Kaufmann, 2008. .

Introduction to Parallel Computing LLNL Web Site (html). .


Multicore processors

Poonacha Kongetira, Kathirgamar Aingaran, Kunle Olukotun, Niagara: A 32-Way Multithreaded Sparc Processor, IEEE Micro, March-April 2005, pp. 21-29. Online PDF for University of Wisconsin only. .

Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu, Mike Chen, and Kunle Olukotun, The Stanford Hydra CMP, IEEE Micro, March-April 2000, pp. 71-84. Online PDF for University of Wisconsin only. .

Luiz Andre Barroso, et al., Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing, Proc. International Symposium on Computer Architecture, June 2000, pp. 282-293. Online PDF for University of Wisconsin only. .

David A. Wood and Mark D. Hill, Cost-Effective Parallel Computing, IEEE Computer, February 1995. IEEE Explore PDF.


Pthreads

POSIX Threads Programming, Web Site (html). .

Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta, The SPLASH-2 Programs: Characterization and Methodological Considerations, Proc. International Symposium on Computer Architecture, June 1995. Online PDF for University of Wisconsin only. .

Edward A. Lee, "The Problem with Threads," Computer, vol. 39, no. 5, pp. 33-42, May 2006, doi:10.1109/MC.2006.180 IEEE Explore PDF Read.

Atul Adya, Jon Howell, Marvin Theimer, Bill Bolosky, John Douceur, Cooperative Task Management without Manual Stack Management, or Event-driven Programming is not the Opposite of Threaded Programming, Proc. USENIX, June 2002. Online PDF for University of Wisconsin only. .


Locking and Threads

McKenney, P. E. 1996. Selecting locking primitives for parallel programming. Commun. ACM 39, 10 (Oct. 1996), 75-82. DOI= http://doi.acm.org/10.1145/236156.236174 PDF. .

Paul E. Mckenney, Selecting Locking Designs for Parallel Programs In Pattern Languages of Program Design, Vol.2, (J.Vlissides, J.Coplien and N.Kerth eds (1996) PDF

Beng-Hong Lim and Anant Agarwal, Reactive synchronization algorithms for multiprocessors, In ASPLOS 1994. PDF

Bayer, R. and M. Schkolnick, Concurrency of Operations on B-Trees, Readings in Database Systems, pp. 127-139.

Gray, J., et. al., "Granularity of Locks and Degrees of Consistency in a Shared Database," Readings in Database Systems, pp. 175-193  


OpenMP

Leonardo Dagum and Ramesh Menon, OpenMP: An Industry Standard API for Shared Memory Programming IEEE Computational Science and Engineering, Jan-Mar, 1998. Online PDF for University of Wisconsin only.

LLNL OpenMP Tutorial, Web Site (html).

OpenMP: Simple, Portable, Scalable SMP Programming, Web Site (html). .


Cilk/TBB

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Multithreaded Algorithms. Introduction to Algorithms. Third Edition. MIT Press, 2011. Online PDF for University of Wisconsin only. .

Don Dailey and Charles E. Leiserson
Using Cilk to Write Multiprocessor Chess Programs The Journal of the International Computer Chess Association, 2002. PDF.

Matteo Frigo, Charles E. Leiserson, Keith H. Randall,
The Implementation of the Cilk-5 Multithreaded Language, Proceedings of the 1998 conference on Programming Language Design and Implementation (PLDI), pp. 212-223. PDF

TBB Tutorial Intel Web Document. Online PDF for University of Wisconsin only.


Serialization Sets

Allen, M. D., Sridharan, S., and Sohi, G. S. 2011. Serialization sets: a dynamic dependence-based parallel execution model. Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2011. PDF


MapReduce

Luiz Andre Barroso, Jeffrey Dean, Urs Holzle, Web Search For a Planet: The Google Cluster Architecture, IEEE Micro, 23(2):22-28, March-April 2003. Online PDF for University of Wisconsin only. .

Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis, Evaulating MapReduce for Multi-core and Multiprocessor Systems, Proceedings of the 13th Intl. Symposium on High-Performance Computer Architecture (HPCA), February 2007. (PDF)


GPGPUs

TESLA: A Unified Graphics and Computing Architecture Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym, “NVIDIA TESLA: A Unified Graphics and Computing Architecture”, IEEE Micro Volume 28, Issue 2, Date: March-April 2008, Pages: 39-55. (PDF) Read

Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt, Analyzing CUDA Workloads Using a Detailed GPU Simulator, In proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 163-174, Boston, MA, April 26-28, 2009. (PDF)


Transactional Memory

Unlocking Concurrency Ali-Reza Adl-Tabatabai, Christos Kozyrakis, Bratin Saha December 2006 ACM Queue, Volume 4 Issue 10

Subtleties of Transactional Memory Atomicity Semantics. Colin Blundell, E Christopher Lewis, and Milo Martin. Computer Architecture Letters (CAL '06), Volume 5, Number 2, November 2006.

Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory Mehrara, M., Hao, J., Hsu, P., and Mahlke, S. 2011. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (Dublin, Ireland, June 15 - 21, 2011). PLDI '09.

McRT-STM: a high performance software transactional memory system for a multi-core runtime. Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, Ben Hertzberg, PPOPP 2006: 187-197

Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Ben Hertzberg, Brian D. Carlstrom, John D. Davis, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun Proceedings of the 31st Annual International Symposium on Computer Architecture, München, Germany, June 19-23, 2004.

Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood, LogTM: Log-based Transactional Memory, Submitted to Proc. Symposium on High-Performance Computer Architecture, February 2006. Online PDF.

Peter Damron, Alexandra Fedorova, Yossi Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum, Hybrid transactional memory, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS), 2006. Online PDF.

Transactional Memory Online, Web Site (html). .

Maurice Herlihy and J. Eliot B. Moss, Transactional Memory: Architectural Support for Lock-Free Data Structures, Proc. International Symposium on Computer Architecture, May 1993. Online PDF for University of Wisconsin only.

Albert Chang and Mark F. Mergen, 801 Storage: Architecture and Programming, ACM Trans. on Computer Systems, February 1988. Online PDF for University of Wisconsin only. Concentrate on transactional issues (e.g., Section 3.3).

Ravi Rajwar and James R. Goodman, Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution, Proc. 34th Intl. Symposium on Microarchitecture, December 2001. Online PDF for University of Wisconsin only.

Ravi Rajwar and James R. Goodman, Transactional Lock-Free Execution of Lock-Based Programs, Proc. Architectural Support for Programming Languages and Operating Systems, October 2002. Online PDF for University of Wisconsin only.

Virendra J. Marathe and Michael L. Scott, A Qualitative Survey of Modern Software Transactional Memory Systems, University of Rochester Computer Science Department TR 839, June 2004. Online PDF for University of Wisconsin only.

C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson, and Sean Lie, Unbounded Transactional Memory, Proc. Symposium on High-Performance Computer Architecture, February 2005. Online PDF for University of Wisconsin only.

Ravi Rajwar, Maurice Herlihy, and Konrad Lai, Virtualizing Transactional Memory, Proc. International Symposium on Computer Architecture, June 2005. Online PDF for University of Wisconsin only.

Maurice Herlihy, Victor Luchangco, and Mark Moir, Obstruction-Free Synchronization: Double-Ended Queues as an Example Proc. International Conference on Distributed Computing Systems, Online PDF for University of Wisconsin only.

Tim Harris, Design Choices for Language-based Transactions, Microsoft Techical Report UCAM-CL-TR-572, August 2003. PDF.

Carlstrom et al., Transactional Execution of Java Programs, OOPSLA Workshop on Synchronization and Concurrency in Object-Oriented Languages (SCOOL), October 2005. PDF.


Misc

"William Thies, Michal Karczmarek, and Saman Amarasinghe. StreamIt: A Language for Streaming Applications. In Proceedings of the 2002 International Conference on Compiler Construction." Online PDF for University of Wisconsin only.

Herlihy, M. A Methodology for Implementing Highly Concurrent Data Structures. PPOPP '90. PDF

N. Hardavellas, I. Pandis, R. Johnson, N. G. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: Limitations and opportunities. In 3rd Biennial Conference on Innovative Data Systems Research (CIDR), 2011. Online PDF for University of Wisconsin only.

David Luebke and Greg Humphreys, "How GPUs Work": PDF

Adl-Tabatabai, A., et al., Compiler and Runtime Support for Efficient Software Transactional Memory, PLDI 2006, pp. 26-37. PDF

Other interesting papers

Maurice Herlihy and Nir Shavit, The Art of Multiprocessor Programming, Morgan Kaufmann, 2008.

John M. Mellor-Crummey and Michael L. Scott, Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors ACM Trans. on Computer Systems. February 1991, pp. 21-65. Online PDF for University of Wisconsin only.

Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, Henry M. Levy, Scheduler Activations: Effective Kernel Support for the User-level Management of Parallelism, Proc. Symposium on Operating System Principles, October 1991. Online PDF for University of Wisconsin only.

 
Computer Sciences | UW Home

  Computer Sciences | UW Home