Gabriel H. Loh, Yuan Xie, Bryan Black, Processor Design in Three-Dimensional Die-Stacking Technologies, In IEEE Micro, vol. 27(3), pp. 31-48, May-June, 2007. IEEE Xplore link |
two natural topologies : face to face or face to back
copper-copper bonding process builds an interdie connection > die-to-die (d2d) or 3D via.
d2d via pitch significantly larger than individual transistor.
> size determines possible 3d part of procesor blocks and funct units.
> latency.
> RC delay : 35% of full stack of vias connecting met 1 to 9.
30% of pins used for power
high power density
face to face > requires through silicon vias (TSVs) for I/O and power. low inductance, so no problem.
3d => 50% area of 2 2d chips => 50% area for pins => power delivery 50%
++ because wire loads are low : power demand goes down
thermals
successive layer farther away from the heat sink
power density
sim result : 2d processors ~ 3d config worst case temp!
wire reduction 3D placement
Partitioning granularity Layer 1| Layer 2
Entire cores CPU | L2
Functional block units ROB | ALU
Logic gate Mux [31:16] | Mux [15:0]
Transistor level PMOS | NMOS
3D cache
cores on one layer, cache on another
2x cores + 2x cache on one, 2x core + 2x cache on other
within cache
stacked bit lines
stacked word lines
word lines more delay than bit lines => stacking word lines provides lower overall latency.
but power => lower bit lines (much longer)
(1) Eliminating critical wires > latency + power reductions
(2) different partitioning strategies to match communication density of a given d2d via interface
wire via pitch does not scale at the same rate as feature-size.
(3) partitioning > power, performance, area.
Mixed process integration
DRAM
onstack DC to DC convertors
decoupling capacitors
Cool places to use
eliminate pipeline wires
higher timing margins : clock skew/jitter is low
better performance/watt ratio
clock frequency improvements > most effective for power
reduce the number of pipeline stages (wires)
NUCA : inherent problem : managing data that cores share.
Dynamic NUCA
3D version of L2 : 90% reductions in cache migrations
- - needs 3D place and route, floorplanning tools, 3D visualization and layout.
fault on a single layer > complete waste of entire stack
DFT > difficult in the presence of finely partitioned 3D structs, a die might have only 50% complete circuit.