Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, David Webb, "The Alpha 21364 Network Architecture," IEEE Micro, vol. 22, no. 1, pp. 26-35, Jan./Feb. 2002. IEEE Xplore link |
152 mil transistors 22.4Gbytes/s router bandwidth.
paper talks about the router : 1.2GHz (same speed as the processor)
using aggressive routing algos, distributed arbiters. large on-chip buffering and pipleined router implementation
support for directory-based cache coherence (virtual packet classes)
Classes :
flit > packet portion transported in parallel on a single clock edge
Request
forward
block response (with data)
nonblock response (ack)
write I/O
Read I/O
special (no-op packets), buffer dealloc info between routers etc.,
2D torus
Virtual cut through routing
flits of a packet proceed through multiple routers until a router blocks the header flit.
blocking router buffers all the packet's flits until congestion clears
blocking router schedules the packet to destination. [buffer space for 316 packets]
Adaptive routing
minimum rectangle (diagonal distance)
either continue on the same direction or turn.
Deadlock avoidance
because of cyclic dependences
eg : request packets can fill up a network and prevent block response packets
> solve : virtual channels with priority.
Adaptive routing can create intra-dimension and inter-dimension deadlock
Jose Duato's theory : as long as packets can drain via a deadlock free path, no deadlock will occur.
intradimention : VC0 and VC1 (non adaptive, horizontal first, vertical next) on primary axis depend on secondary axis, but not vice versa. when they change dimension, they recompute their virtual channels in the new dimension.
Dally scheme : incremental number of all processors in a dimension : if source's number is < destination : VC0 else VC1.
but VC0, VC1 not well balanced.
VC0, VC1 can return to adaptive if a subsequent router is not blocked.
Router Architecture
input and output ports : local (cache and memory controller), interprocessor (off chip network), I/O.
clock forwarding
Local arbiters : various readiness tests to determine if a packet can be speculatively dispatched via the router.
valid input, output port is free, next buffer has a free buffer, target output port is free, anti-starvation mechanism, read I/O packet does not pass write I/O packet.
fairness. : coherence dependence priority mode and rotary rule.