Kenneth Birman @ Cornell, Andre Schiper @ Switzerland, Pat Stephenson @ Cornell
ACM Trans. on Computer Systems 1991
![]() |
Why multicast is important? In message passing system--synchronization by messaging, multicast is the most complex form of messaging! |
||||||
![]() |
A simple example of how multicast is used in distributed computing
|
![]() |
Type of groups: Fig 1
|
||||||||
![]() |
Expected usage of group
|
![]() |
Asynchronous atomic multicast which keeps causality order |
![]() |
Support every type of groups in section 2 |
![]() |
System = fixed set P of processes with disjoint address spaces |
||||||||
![]() |
Group g
|
||||||||
![]() |
Multicast
|
||||||||
![]() |
View of group g
|
||||||||
![]() |
Message transport layer: reliable point-to-point & multicast service
|
||||||||
![]() |
Execution of a process = a partially ordered sequence of atomic events--all or nothing & not interleaved by another action--such as:
|
||||||||
![]() |
System causality relation
|
||||||||
![]() |
Liveness
|
![]() |
Virtual synchronous programming environment: user can program as if the system schedules one distributed event at a time |
||||||||||||||
![]() |
Aspects of virtual synchrony system
|
||||||||||||||
![]() |
Fault tolerance: fault atomicity + CBCAST delivery order
|
![]() |
How can we say that send(m) -> send(m')? We need some virtual time |
||||||
![]() |
VT(p): vector time of process p
|
||||||
![]() |
VT of message
|
||||||
![]() |
The comparison of VT: comparison of corresponding counters |
![]() |
Single process group with fixed membership |
![]() |
CBCAST order can be said
|
||||||||||
![]() |
CBCAST protocol
|
||||||||||
![]() |
CBCAST always guarantees that causal order is kept
|
![]() |
ABCAST does not keep causality order |
||||||||||||||||||||||
![]() |
Causal ABCAST: an modified ABCAST which keeps causality |
||||||||||||||||||||||
![]() |
Idea
|
||||||||||||||||||||||
![]() |
Protocol
|
![]() |
A CBCAST msg is said to stable, if it gets received by all recipients |
![]() |
Stability number n of a process: VT(msg)[pid] <= n, for all stable msg sent by the process |
![]() |
Stability number may be piggybacked with CBCAST msg or sent by itself |
![]() |
VT(msg) needs only carry vector timestamp fields that have changed since the last multicast by the sender
|
||
![]() |
VT compression could save nothing due to the additional field to record which indices are included in the timestamp |
||
![]() |
For bursty and highly localized traffic, VT compression can save a lot |
![]() |
To achieve atomicity of msg delivery while group membership changes dynamically, several issues must be addressed
|
![]() |
VT counts the number of msgs sent -> with multiple groups, msg is delivered to some processes -> when a process receives a msg with VT[i] = k, it does NOT know how many msgs from process i will precede the msg -> Need VT for each group: denoted as VT(msg, group) or VT(sender/receiver, group) |
||||||||||||
![]() |
Also stability number for each group |
||||||||||||
![]() |
CBCAST protocol for multiple group
|
||||||||||||
![]() |
=> VT for every group should be included in each msg -> 6.3 Extended VT Compression |
![]() |
Causal ABCAST which uses CBCAST no more keeps the order of delivery among different groups -> Need global ordering, but little practical and out of scope of this paper |
![]() |
VT Compression by sending only delta: changed field, even entire VT for a group can be omitted |
![]() |
By changing the CBCAST protocol as to deliver concurrent messages in causality in the order of reception, VT field can be further compressed |
![]() |
Locality will increase the efficiency of compression |
![]() |
Message carries VT for (virtually) all groups in the system -> Processes hold the VT of groups which do not belongs indefinitely -> Big overhead |
![]() |
Solution: Periodic flush(= setting fields of VT to zero) for every active group & throwing away VT for groups not being refreshed within the timeout |
![]() |
(1) Virtually synchronous addressing: no process failure or departure from a group is assumed => 5.5 method still works |
||||
![]() |
(3) Delivery atomicity and virtual synchrony when failure occur
|
![]() |
VT carried with msgs is still headache |
||
![]() |
If communication structure is known in advance to have a particular pattern, the number of VTs can be reduced drastically |
||
![]() |
Communication pattern: graph with vertex = group, edge connecting two group iff two group shares members |
||
![]() |
Biconnected component = subgraph with every pair of vertices connected by at least two paths |
||
![]() |
k-bounded: max(|biconnected component|) = k |
||
![]() |
Theorem: If a system uses the VT protocol to maintain causality, it is sufficient and necessary for a process p to maintain and transmit those VT timestamps corresponding to groups in the biconnected component to which p belongs
|
![]() |
If process joins and leaves group dynamically, it is almost impossible to get the graph of 6.6 |
||||||||
![]() |
Conservative solution
|
||||||||
![]() |
Multicast epochs: process leaves a group only due to failure
|
||||||||
![]() |
(2) Multicast epochs: process leaves a group for other reasons, remaining active and possibly joining other groups
|
![]() |
Client/Server model in ISIS
|
||||
![]() |
Diffusion model
|
||||
![]() |
Hierarchical model: asynch -> O(number of members of subgroup) |
![]() |
ISIS uses RPC for point-to-point message
|
![]() |
How to inform the system of application structure, such as acyclic comm pattern? Current ISIS uses conservative method to detect comm pattern. |
||||
![]() |
Possible extension: introduction of causality domain
|
![]() |
How should be membership changes detected and managed? Use CBCAST to inform membership change -> flush |
![]() |
Also need a mechanism to detect a failure of a member. In another paper |
![]() |
Is causality only in terms of msg being sent or received enough? |
![]() |
VT is too complex and heavy |