(3.4.4) Butterfly OCN

John Kim, James Balfour, and William J Dally. Flattened butterfly topology for on-chip networks. In MICRO 40: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007. ACM DL Link

Most OCN low radix (router has one node connected) 2-D mesh : RAW processor, TRIPS, 80-node Intel's teraflops and Tilera 64

Off chip networks : Cost increases with channel count channel count increases with hop count
On chip networks : bandwidth is plentiful but buffers are expensive. but reducing diameter of OCN > power consumption lesser
                              concentration is practical (reduces wiring complexity)

Latency on chip 
     Header latency + serialization latency + time of flight on wires
     Hop count * router delay + packet size / channel bandwidth + time of flight on wires

Butterfly 
     topology
          on a 2D layout, all routers in a column and row are connected. (direct connections to nth router in a row allowed)
          long wire impact : reduced by optimally inserting repeaters and pipeline register to preserve channel bandwidth > but traversal takes several cycles
     routing and deadlock
          VCs may be needed to prevent deadlock
          Dimension Order routing (go right first then up/down)
          non minimal routing allows path diversity available on flattened butterfly to improve load balance and performance.
          UGAL : 2 steps, minimal to an intermediate node and then minimal to destination
     Bypass channels 
          router bypass arch
          mux arbiter
               direct input is given priority at input of the mux
               bypass channel is given priority at output of the mux. 
          switch architecture
          flow control and routing
Paper has cool numbers about power and area of flattened butterfly high radix networks : 
     Power = Channel + Router + Memory
     Area ( Butterfly ) smaller < due to high radix several connections possible between 2 nodes => bi section bandwidth constant => thinner wires => lower power
     Smaller buffers required : wormhole : Virtual channel to prevent deadlock. Because of multiple paths : a packet can block another packet from the same source and destination.

Future
     optical signalling
     virtual channels to break deadlock
     ideal latency ?
     flit-reservation flow control