Michael Zhang, Krste Asanovic: Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. ISCA 2005: 336-345. ACM DL Link |
ISCA 05
Problem tackled : in a tiled CMP whats the best way to organise the L2?
Private L2 => - replication | + fast access | - directory (coherence)
Shared L2 => - network congestion hops. each address has home tile.
Proposed
Victim Replication L2VR :
Capture evictions from the local primary cache in the local L2 slice.
Retained victim is a local L2 replica of a line that already exists in the L2 of the remote home tile.
Whats replaced : (1) An invalid line (2) A global line with no sharers (3) An existing replica.
- Area overhead : L2 tags must be wide enuf (lines can exist in non homes).
Related Work
Pirahna : Compaq's architecture : uses L2 shared non inclusive. snoopy bus.
MIT Raw architecture : claimed non coherent L2 private distributed evenly among 16 processors