Likely to be updated.
H&P is
John L. Hennessy and David A. Patterson,
Computer Architecture: A Quantitative Approach,
Morgan Kaufmann Publishers, Third Edition, 2002.
HJ&S is
Mark D. Hill, Norman P. Jouppi, and Gurindar S. Sohi,
Readings in Computer Architecture,
Morgan Kaufmann Publishers, 2000.
Introduction & Some History
H&P Section 6.1 (Introduction)
HJ&S Introduction to Chapter 9 "Multiprocessors and Multicomputers,"
2000.
Online PDF for University of Wisconsin only.
C. Gordon Bell, Multis: A New Class of Multiprocessor Computers,
Science, 26 April 1985, pp. 462-466.
Online PDF for University of Wisconsin only.
Charles L. Seitz, The Cosmic Cube,
Communications of the ACM,
January 1985, pp. 22-33.
Online PDF for University of Wisconsin only.
Reprinted in HJ&S pp. 611-622.
Lewis W. Tucker and George G. Robertson,
Architecture and Applications of the Connection Machine,
IEEE Computer, August 1988, pp. 26-39.
Online PDF for University of Wisconsin only.
Reference.
Parallel Programming & Methods
H&P Section 6.2 (Application Characteristics).
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh,
and Anoop Gupta,
The SPLASH-2 Programs: Characterization and Methodological Considerations,
Proc. International Symposium on Computer Architecture,
June 1995.
Online PDF for University of Wisconsin only.
W. Daniel Hillis and Guy L. Steele,
Data Parallel Algorithms,
Communications of the ACM, December 1986, pp. 1170-1183.
Online PDF for University of Wisconsin only.
The Message Passing Interface (MPI) standard,
Web Site
(html).
Reference.
OpenMP: Simple, Portable, Scalable SMP Programming,
Web Site
(html).
Luiz Andre Barroso, Kourosh Gharachorloo, and Edward Bugnion,
Memory System Characterization of Commercial Workloads,
Proc. International Symposium on Computer Architecture,
June 1998.
Online PDF for University of Wisconsin only.
Symmetric Multiprocessors
H&P Sections 6.3 & 6.4 (SMPs & SMP Performance).
Paul Sweazey and Alan Jay Smith,
A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Futurebus,
Proc. Thirteenth International Symposium on Computer Architecture,
June 1986.
Online PDF for University of Wisconsin only.
Leslie Lamport,
How to Make a Multiprocessor Computer that Correctly
Executes Multiprocess Programs,
IEEE Tran. on Computers,
September 1979, pp. 690-691
Online PDF for University of Wisconsin only.
Reprinted in HJ&S pp. 754-755.
H&P Section 6.7 (Synchronization).
John M. Mellor-Crummey and Michael L. Scott,
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
ACM Trans. on Computer Systems.
February 1991, pp. 21-65.
Online PDF for University of Wisconsin only.
Alan Charlesworth,
Starfire: Extending the SMP Envelope,
IEEE Micro,
January/February 1998, pp. 39-49.
Online PDF for University of Wisconsin only.
Lance Hammond, Ben Hubbert, Michael Siu, Manohar Prabhu,
Mike Chen, and Kunle Olukotun
The Stanford Hydra CMP,
IEEE Micro,
March-April 2000, pp. 71-84.
Online PDF for University of Wisconsin only.
Ravi Rajwar and James R. Goodman,
Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution,
Proc. 34th Intl. Symposium on Microarchitecture,
December 2001.
Online PDF for University of Wisconsin only.