(3.4.1) CM-5

Charles E. Leiserson, et al., The Network Architecture of the Connection Machine CM-5, Proc. ACM Symposium on Parallel Algorithms and Architectures, June 1992, pp. 272-295. ACM DL link


Thinking Machines Corp
1996

economy of mechanism > machine should have only 1 comm network to convey info. 
CM 5 has 3 networks

Syncronized MIMD machine, shared memory, procs communicate via message passing
data network > send messages
control network > sync and multiparty comm primitives

32-16384 32 MHz SPARC procs, control procs : Sun Micro work station front ends

processors can be split into user partitions, privileged or non. users time muxed. low os overhead to user task communication.

Network interface
     + simple and uniform view of the network
     + support for time sharing, space sharing and mapping out of failed components
     + decoples design decisions made for networks
     > memory mapped registers on protected pages << MMU takes care of privilege.
     > Context switching : automatic checkpointing of user tasks. 
     > user's view of the networks is independent of network topology.

CM-5 Data Network
     balancing message loads > fetch deadlock problem
     fat-tree. user partition = subtree in network.
     route to least common ancestor of src and dest. pseudo random choice at each level > load balancing
     differential pair comm > noise immunity and reduced overall power requirements.
     similar to cut-through/wormhole routing
     >>> data network is bound by a contract with the processors to guarantee that deadlock never occurs. 
          typically > reservation mechanism > a max no of messages are outstanding between 2 processors.
               substantial over head
          CM-5 > left port and right port (virtual channels)
               send the request on the left port always (response can be on any port)
     good efficiency as user controls directly

     all-fall-down mode : user time over : just drop the messages down the tree to any node, when resumes, that node transmits it to actual destination

CM-5 Control Network
     split-phase barrier mechanism for synchronization
     broadcasting
          collision > error
     combining
          reduction, forward scan (parallel prefix), backward scan (parallel suffix), router done. 
          first 3 : bitwise logical OR, XOR, signed max, max, addition etc.,
          router done : when data mesages have completed.
     async OR of all procs.
     
     implemented as binary tree. 
     message sent up, broadcast to all nodes in that partition.

CM-5 diagnostic Network
     user does not know this exists.
     detects program functionality dependent and independent (DFT)
     JTAG DFT is connected on back plane on diagnostic network > geographical address (cabinets, backplane, slot type slot etc., and network address)