(2.2.4) Non Uniform Cache structure

Changkyu Kim, Doug Burger, Stephen W. Keckler: An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS 2002. ACM DL Link


Wire Delay dominated cache > overall access latency is high. 

Design space for Caches
     Mapping    > how many addressable banks and how lines are mapped
     Search      > set of possible places a block can be?
     Movement > always in the same bank?

Structures considered
UCA : Uniform Cache Access.
ML-UCA : multilevel UCA
S-NUCA-1 : mapping predetermined based on block index.
S-NUCA-2 : 2-D switched logic instead of private per bank channels > large number of smaller faster banks.
D-NUCA : each bank form one way of a set, frequently accessed data in the closest set. 

Eval
UCA Contention > bank contention / channel contention. 
S-NUCA-1 : Private Channels > each bank can be acced indep at high speed. 
S-NUCA-2 : Switched Channels > no large wires. lightweight, wormhole routed 2D mesh 
Dynamic NUCA : spread sets : each set is spread across multiple banks (one way per bank). 
     mapping : simple (start below, go up every bank), fair (distance from focus equal), shared(closest banks split to everyone). 
     Locating : incremental search : closest bank searched first
                    multicast search : some or all banks searched (happens in ||)
                    limited multicast search : first M of N banks.
                    partitioined multicast : bank set is broken down into subsets of banks.
     search :   partial tag comparison : smart search : ss-performance, ss-energy.
     dynamic movement : generational promotion > each hit, swapped with the closer cache controller. 
     insertion : which location to fill in / replace (zero copy/one copy policy)