Changkyu Kim, Doug Burger, Stephen W. Keckler: An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS 2002. ACM DL Link |
Wire Delay dominated cache > overall access latency is high.
Design space for Caches
Mapping > how many addressable banks and how lines are mapped
Search > set of possible places a block can be?
Movement > always in the same bank?
Structures considered
UCA : Uniform Cache Access.
ML-UCA : multilevel UCA
S-NUCA-1 : mapping predetermined based on block index.
S-NUCA-2 : 2-D switched logic instead of private per bank channels > large number of smaller faster banks.
D-NUCA : each bank form one way of a set, frequently accessed data in the closest set.
Eval
UCA Contention > bank contention / channel contention.
S-NUCA-1 : Private Channels > each bank can be acced indep at high speed.
S-NUCA-2 : Switched Channels > no large wires. lightweight, wormhole routed 2D mesh
Dynamic NUCA : spread sets : each set is spread across multiple banks (one way per bank).
mapping : simple (start below, go up every bank), fair (distance from focus equal), shared(closest banks split to everyone).
Locating : incremental search : closest bank searched first
multicast search : some or all banks searched (happens in ||)
limited multicast search : first M of N banks.
partitioined multicast : bank set is broken down into subsets of banks.
search : partial tag comparison : smart search : ss-performance, ss-energy.
dynamic movement : generational promotion > each hit, swapped with the closer cache controller.
insertion : which location to fill in / replace (zero copy/one copy policy)