Charles E. Leiserson, et al., The Network Architecture of the Connection Machine CM-5, Proc. ACM Symposium on Parallel Algorithms and Architectures, June 1992, pp. 272-295. ACM DL link
Thinking Machines Corp
1996
economy of mechanism > machine should have only 1 comm network to convey info.
CM 5 has 3 networks
Syncronized MIMD machine, shared memory, procs communicate via message passing
data network > send messages
control network > sync and multiparty comm primitives
32-16384 32 MHz SPARC procs, control procs : Sun Micro work station front ends
processors can be split into user partitions, privileged or non. users time muxed. low os overhead to user task communication.
Network interface
+ simple and uniform view of the network
+ support for time sharing, space sharing and mapping out of failed components
+ decoples design decisions made for networks
> memory mapped registers on protected pages << MMU takes care of privilege.
> Context switching : automatic checkpointing of user tasks.
> user's view of the networks is independent of network topology.
CM-5 Data Network
balancing message loads > fetch deadlock problem
fat-tree. user partition = subtree in network.
route to least common ancestor of src and dest. pseudo random choice at each level > load balancing
differential pair comm > noise immunity and reduced overall power requirements.
similar to cut-through/wormhole routing
>>> data network is bound by a contract with the processors to guarantee that deadlock never occurs.
typically > reservation mechanism > a max no of messages are outstanding between 2 processors.
substantial over head
CM-5 > left port and right port (virtual channels)
send the request on the left port always (response can be on any port)
good efficiency as user controls directly
all-fall-down mode : user time over : just drop the messages down the tree to any node, when resumes, that node transmits it to actual destination
CM-5 Control Network
split-phase barrier mechanism for synchronization
broadcasting
collision > error
combining
reduction, forward scan (parallel prefix), backward scan (parallel suffix), router done.
first 3 : bitwise logical OR, XOR, signed max, max, addition etc.,
router done : when data mesages have completed.
async OR of all procs.
implemented as binary tree.
message sent up, broadcast to all nodes in that partition.
CM-5 diagnostic Network
user does not know this exists.
detects program functionality dependent and independent (DFT)
JTAG DFT is connected on back plane on diagnostic network > geographical address (cabinets, backplane, slot type slot etc., and network address)