Main »

Architecture Qualifier Reading List

Please report any broken web links, typos, or incorrect references to

1.  Textbooks

  1. John L. Hennessy and David A. Patterson. Computer Organization and Design: The Hardware and Software Interface, Morgan Kaufmann Publishers.
    3rd edition: Chapter 1-7; Appendix B and C
    4th edition: Chapter 1-4; Appendix C and D
  2. John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers. 4th Edition. Chapter 1, 2, 3, 4, 5; Appendix A, B, E, F

2.  Uniprocessor

2.1  Processor Core

  1. J. E. Smith and A. R. Pleszkun. Implementing Precise Interrupts in Pipelined Processors, IEEE Trans. on Computers, May 1988, pp. 562-573. ACM DL Link
  2. Joseph A. Fisher and B. Ramakrishna Rau. Instruction-Level Parallel Processing, Science, 13 September 1991, pp. 1233-1241. Science Link
  3. T-Y. Yeh and Y. Patt, Two-level Adaptive Training Branch Prediction, Proc. 24th Annual International Symposium on Microarchitecture, November 1991, pp. 51-61. ACM DL Link
  4. Gurindar S. Sohi, Scott E. Breach, and T.N. Vijaykumar, Multiscalar Processors, Proc. 22nd Annual Symposium on Computer Architecture, June 1995, pp. 414-425. ACM DL Link
  5. Subbarao Palacharla, Norman P. Jouppi and J. E. Smith. Complexity-effective Superscalar Processors, Proc. 24th Annual International Symposium on Computer Architecture, June 1997, pp. 206-218.
  6. Andreas Moshovos, Scott E. Breach, T. N. Vijaykumar, Gurindar S. Sohi, Dynamic Speculation and Synchronization of Data Dependences. ISCA 1997: 181-193. ACM DL Link
  7. Srikanth T Srinivasan, Ravi Rajwar, Haitham Akkary, Amit Gandhi, Mike Upton, Continual flow pipelines. ASPLOS 2004. ACM DL Link

2.2  Memory

  1. Wen-Hann Wang, Jean-Loup Baer, and Henry M. Levy. Organization and Performance of a Two-Level Virtual-Real Cache Hierarchy, ISCA 1989. ACM DL Link
  2. Bruce Jacob and Trevor Mudge. Virtual Memory on Contemporary Processors, IEEE Micro, vol. 18, no. 4, 1998. IEEE Xplore link
  3. Vinod Cuppu, Bruce Jacob, Brian Davis, and Trevor Mudge, A performance comparison of contemporary DRAM architectures, ISCA 1999. IEEE Xplore link
  4. Changkyu Kim, Doug Burger, Stephen W. Keckler: An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS 2002. ACM DL Link
  5. Viji Srinivasan, Davidson, E.S., Tyson, G.S., "A prefetch taxonomy," Computers, IEEE Transactions on , vol.53, no.2, pp. 126-140, Feb 2004. IEEE Xplore link

2.3  ISA/Compilation

  1. J. S. Emer and D. W. Clark. A Characterization of Processor Performance in the VAX-11/780, ISCA 1984. ACM DL Link
  2. Arvind, Rishiyur S. Nikhil. Executing a Program on the MIT Tagged-Token Dataflow Architecture. IEEE Trans. Computers 39(3): 300-318 (1990). ACM DL Link
  3. W. W. Hwu, R. E. Hank, D. M. Gallagher, S. A. Mahlke, D. M. Lavery, G. E. Haab, J. C. Gyllenhaal, and D. I. August. Compiler Technology for Future Microprocessors. Proceedings of the IEEE 83(12), December 1995. IEEE Xplore link
  4. Jerry Huck, Dale Morris, Jonathan Ross, Allan Knies, Hans Mulder, Rumi Zahir. Introducing the IA-64 Architecture. IEEE Micro, vol. 20, no. 5, pp. 12-23, Sep./Oct. 2000. IEEE Xplore link
  5. John Goodacre and Andrew N. Sloss, Parallelism and the ARM instruction set architecture. Computer, July 2005. IEEE Xplore

2.4  Case Studies

  1. Richard M. Russell. The Cray-1 Computer System, Communications of the ACM, January 1978, pp. 63-72. ACM DL Link
  2. Kenneth C. Yeager. The MIPS R10000 Superscalar Microprocessor, IEEE Micro, April 1996, pp. 28-40. IEEE Xplore link
  3. Timothy J. Slegel, et al. IBM's S/390 G5 Microprocessor, IEEE Micro, Mar/Apr 1999, pp. 12-23. IEEE Xplore link

2.5  Recent Trends

  1. T. Mudge. Power: A first class design constraint. Computer, vol. 34, no. 4, April 2001, pp. 52-57. IEEE Xplore link
  2. Viji Srinivasan, David Brooks, Michael Gschwind, Pradip Bose, Victor V. Zyuban, Philip N. Strenski, Philip G. Emma: Optimizing pipelines for power and performance. MICRO 2002: 333-344. IEEE Xplore link
  3. Dan Ernst, et al., A Low-Power Pipeline Based on Circuit-Level Timing Speculation, MICRO 2003. IEEE Xplore link
  4. Shubhendu S. Mukherjee, Christoper Weaver, Joel Emer, Steven K. Reinhardt, and Todd Austin, "Measuring Architectural Vulnerability Factors," Top picks of 2003 in IEEE Micro, Nov/Dec 2003. IEEE Xplore link
  5. D. Burger et al. Scaling to the End of Silicon with EDGE Architectures. IEEE Computer 2004. Volume: 37, Issue: 7. ACM DL Link
  6. Shekhar Y. Borkar: Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation. IEEE Micro 25(6): 10-16 (2005). IEEE Xplore link
  7. Gabriel H. Loh, Yuan Xie, Bryan Black, Processor Design in Three-Dimensional Die-Stacking Technologies, In IEEE Micro, vol. 27(3), pp. 31-48, May-June, 2007. IEEE Xplore link
  8. Taeho Kgil, David Roberts, Trevor Mudge, "Improving NAND Flash Based Disk Caches," ISCA 2008. ACM DL link

3.  Multiprocessor

3.1  Programming Models & Methods

  1. W. Daniel Hillis and Guy L. Steele. Data Parallel Algorithms, Communications of the ACM, December 1986, pp. 1170-1183. ACM DL Link
  2. John M. Mellor-Crummey, Michael L. Scott: Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. ACM Trans. Comput. Syst. 9(1): 21-65 (1991). ACM DL link
  3. Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, Anoop Gupta: The SPLASH-2 Programs: Characterization and Methodological Considerations. ISCA 1995: 24-36. ACM DL Link
  4. Leonardo Dagumand and Ramesh Menon, OpenMP: An Industry Standard API for Shared Memory Programming. IEEE Computational Science and Engineering, Jan-Mar, 1998. IEEE Xplore link, Local uw link
  5. A.R. Alameldeen and D.A. Wood,, "Variability in Architectural Simulations of Multi-threaded Workloads," HPCA 2003. IEEE Xplore

3.2  Memory Coherence & Consistency

  1. L. Lamport, How to make a multiprocessor computer that correctly executes multiprocess programs, IEEE Transactions on Computers, vol. 28, no. 9, pp. 241-248, Sept. 1979. IEEE Xplore link
  2. Anoop Gupta, Wolf-Dietrich Weber: Cache Invalidation Patterns in Shared-Memory Multiprocessors. IEEE Trans. Computers 41(7): 794-810 (1992). IEEE Xplore link
  3. Sarita V. Adve and Kourosh Gharachorloo. Shared Memory Consistency Models: A Tutorial, IEEE Computer, December 1996, pp. 66-76. IEEE Xplore link
  4. Michael Zhang, Krste Asanovic: Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. ISCA 2005: 336-345. ACM DL Link
  5. Milo M. K. Martin, Mark D. Hill, and David A. Wood, "Token Coherence: Decoupling Performance and Correctness," International Symposium on Computer Architecture (ISCA), June 2003. IEEE Xplore Link

3.3  Case Studies

  1. Steven L. Scott, Synchronization and Communication in the T3E Multiprocessor, Proc. 7th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996, pp. 26-36. ACM DL Link
  2. James Laudon, Daniel Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. ISCA 1997: 241-251. ACM DL Link
  3. Erik Hagersten and Michael Koster, WildFire: A Scalable Path for SMPs, Proc. 5th IEEE Symposium on High-Performance Computer Architecture, January 1999, 172-181. IEEE Xplore link
  4. Luiz Andre Barroso, Kourosh Gharachorloo, Robert McNamara, Andreas Nowatzyk, Shaz Qadeer, Barton Sano, Scott Smith, Robert Stets, and Ben Verghese. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing, Proc. International Symposium on Computer Architecture, June 2000, pp. 282-293. ACM DL link
  5. Poonacha Kongetira, Kathirgamar Aingaran, Kunle Olukotun, "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro, vol. 25, no. 2, pp. 21-29, Mar./Apr. 2005. IEEE Xplore
  6. Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym, "NVIDIA TESLA: A Unified Graphics and Computing Architecture", IEEE Micro Volume 28, Issue 2, Date: March-April 2008, Pages: 39-55. IEEE Xplore

3.4  Interconnection network

  1. Charles E. Leiserson, et al., The Network Architecture of the Connection Machine CM-5, Proc. ACM Symposium on Parallel Algorithms and Architectures, June 1992, pp. 272-295. ACM DL link
  2. Dally and Towles, "Route packets, not wires: on-chip interconnection networks" , DAC 2001. IEEE Xplore link,
  3. Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, David Webb, "The Alpha 21364 Network Architecture," IEEE Micro, vol. 22, no. 1, pp. 26-35, Jan./Feb. 2002. IEEE Xplore link
  4. John Kim, James Balfour, and William J Dally. Flattened butterfly topology for on-chip networks. In MICRO 40: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007. ACM DL Link

3.5  Recent Trends

  1. J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. A Scalable Approach to Thread-Level Speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000. ACM DL link
  2. Ravi Rajwar, James R. Goodman: Speculative lock elision: enabling highly concurrent multithreaded execution. MICRO 2001: 294-305. ACM DL link
  3. Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill and David A. Wood. LogTM: Log-based Transactional Memory. HPCA 2006. IEEE Xplore link
  4. Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, Andreas Moshovos: Mechanisms for store-wait-free multiprocessors. ISCA 2007: 266-277. ACM DL link

3.6  Miscellaneous

  1. Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor, Proc. 23rd Annual International Symposium on Computer Architecture, May 1996, pp. 191-202. ACM DL link
  2. Mark D. Hill, Michael R. Marty, "Amdahl's Law in the Multicore Era," Computer, vol. 41, no. 7, pp. 33-38, July 2008. IEEE Xplore link

4.  Supplemental Reading

  1. Arthur W. Burks, Herman H. Goldstine, John von Neumann. Preliminary discussion of the logical design of an electronic computing instrument, Report to the U.S. Army Ordinance Department, 1946. Reprinted as Chapter 4 of Bell and Newell, Computer Structures: Readings and Examples, McGraw-Hill, 1971. URL: Chapter 4 of
  2. International Roadmap for Semiconductors (ITRS) at, sponsored by Semiconductor Industry Association (SIA) and other associations in Europe, Japan, and Korea. See the 2008 Edition or newer.

Page last modified on February 06, 2010, visited 1072 times

Edit - History - Print - Recent Changes (All) - Search