(3.2.4) Victim Replication

Michael Zhang, Krste Asanovic: Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. ISCA 2005: 336-345. ACM DL Link

ISCA 05

Problem tackled : in a tiled CMP whats the best way to organise the L2? 
     Private L2 => - replication | + fast access | - directory (coherence)
     Shared L2 => - network congestion hops. each address has home tile. 

Proposed
     Victim Replication L2VR : 
          Capture evictions from the local primary cache in the local L2 slice. 
          Retained victim is a local L2 replica of a line that already exists in the L2 of the remote home tile. 
          Whats replaced : (1) An invalid line (2) A global line with no sharers (3) An existing replica. 
          - Area overhead : L2 tags must be wide enuf (lines can exist in non homes). 

Related Work 
     Pirahna : Compaq's architecture : uses L2 shared non inclusive. snoopy bus.
     MIT Raw architecture : claimed non coherent L2 private distributed evenly among 16 processors