### UPC Ph.D. Course on Parallel Computer Architecture

### Symmetric Multiprocessors Part 2 (Chapter 6)

Copyright 2003 Mark D. Hill University of Wisconsin-Madison

Slides are derived from work by Sarita Adve (Illinois), Babak Falsafi (CMU), Alvy Lebeck (Duke), Steve Reinhardt (Michigan), and J. P. Singh (Princeton). Thanks!

# Review: Symmetric Multiprocesors (SMP)

- Multiple (micro-)processors
- Each has cache (today a cache hierarchy)
- · Connect with logical bus (totally-ordered broadcast)

UPC Parallel Computer Architecture

- Implement Snooping Cache Coherence Protocol
  - Broadcast all cache "misses" on bus
     All caches "snoop" bus and may act
  - Memory responds otherwise

(C) 2003 Mark D. Hill from Adve, Falsafi, Lebeck, Reinhardt, & Singh

**Review: Snoopy Design Choices** Controller updates state of blocks in response to processor and snoop events and generates bus xactions Cache Often have duplicate cache tags • State Tag Data Snoopy protocol
 – set of states • • • - state-transition diagram - actions Basic Choices - write-through vs. write-back Snoop (observed bus transaction) - invalidate vs. update (C) 2003 Mark D. Hill from Adve, Falsafi, Lebeck, Reinhardt, & Singh UPC Parallel Computer Arch











UPC Parallel Computer Arch

(C) 2003 Mark D. Hill from Adve, Falsafi, Lebeck, Reinhardt, & Singh



### **Base Cache Coherence Design**

- Single-level write-back cache
- Invalidation protocol

(C) 2003 Mark D. Hill from Adve, Falsafi, Lebeck, Reinhardt, & Singh

- · One outstanding memory request per processor
- Atomic memory bus transactions
- no interleaving of transactions · Atomic operations within process
- one finishes before next in program order
- Examine write serialization, completion, atomicity
- · Then add more concurrency and re-examine

UPC Parallel Computer Architecture

# **Cache Controller and Tags** • On a miss in uniprocessor: Assert request for bus - Wait for bus grant - Drive address and command lines - Wait for command to be accepted by relevant device - Transfer data • In snoop-based multiprocessor, cache controller must: - Monitor bus and processor » Can view as two controllers: bus-side, and processor-side » With single-level cache: dual tags (not data) or dual-ported tag RAM » synchronize on updates - Respond to bus transactions when necessary (C) 2003 Mark D. Hill from Adve, Falsafi, Lebeck, Reinhardt, & Singh

UPC Parallel Computer Arch















































































## Sun Enterprise 10000

- How far can you go with snooping coherence?
- Quadruple request/snoop bandwidth using four address busses
   – each handles 1/4 of physical address space
  - impose logical ordering for consistency: for writes on same cycle, those on bus 0 occur "before" bus 1, etc.
- Get rid of data bandwidth problem: use a network
   E10000 uses 16x16 crossbar betw. CPU boards & memory boards
   Each CPU board has up to 4 CPUs: max 64 CPUs total
- 10.7 GB/s max BW, 468 ns unloaded miss latency

(C) 2003 Mark D. Hill from Adve, Falsafi, Lebeck, Reinhardt, & Singh

• See "Starfire: Extending the SMP Envelope", IEEE Micro, Jan/Feb 1998

UPC Parallel Computer Architecture

 Outline

 • Coherence Control Implementation

 • Writebacks, Non-Atomicity, & Serialization/Order

 • Hierarchical Cache

 • Split Buses

 • Deadlock, Livelock, & Starvation

 • Case Studies

 • TLB Coherence

 • Virtual Cache Issues

















# Outline Coherence Control Implementation Writebacks, Non-Atomicity, & Serialization/Order Hierarchical Cache Split Buses Deadlock, Livelock, & Starvation Case Studies TLB Coherence

Virtual Cache Issues

(C) 2003 Mark D. Hill from Adve, Falsafi, Lebeck, Reinhardt, & Singh UPC Parallel Computer Architecture