John Kim, James Balfour, and William J Dally. Flattened butterfly topology for on-chip networks. In MICRO 40: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007. ACM DL Link |
Most OCN low radix (router has one node connected) 2-D mesh : RAW processor, TRIPS, 80-node Intel's teraflops and Tilera 64
Off chip networks : Cost increases with channel count channel count increases with hop count
On chip networks : bandwidth is plentiful but buffers are expensive. but reducing diameter of OCN > power consumption lesser
concentration is practical (reduces wiring complexity)
Latency on chip
Header latency + serialization latency + time of flight on wires
Hop count * router delay + packet size / channel bandwidth + time of flight on wires
Butterfly
topology
on a 2D layout, all routers in a column and row are connected. (direct connections to nth router in a row allowed)
long wire impact : reduced by optimally inserting repeaters and pipeline register to preserve channel bandwidth > but traversal takes several cycles
routing and deadlock
VCs may be needed to prevent deadlock
Dimension Order routing (go right first then up/down)
non minimal routing allows path diversity available on flattened butterfly to improve load balance and performance.
UGAL : 2 steps, minimal to an intermediate node and then minimal to destination
Bypass channels
router bypass arch
mux arbiter
direct input is given priority at input of the mux
bypass channel is given priority at output of the mux.
switch architecture
flow control and routing
Paper has cool numbers about power and area of flattened butterfly high radix networks :
Power = Channel + Router + Memory
Area ( Butterfly ) smaller < due to high radix several connections possible between 2 nodes => bi section bandwidth constant => thinner wires => lower power
Smaller buffers required : wormhole : Virtual channel to prevent deadlock. Because of multiple paths : a packet can block another packet from the same source and destination.
Future
optical signalling
virtual channels to break deadlock
ideal latency ?
flit-reservation flow control