My note's for 2nd revision ========================== vs. DSN - no dedicated server - no naming structure simplify p2p systems and application, address following challenges - load balance: Chord is like a distributed hash function, spreading keys evenly over the nodes - decentralize: no node is important than any other - scalability: cost of lookup grows as the log of the number of nodes - availability: automatically adjust internal tables to reflect newly joined nodes as well as node failures - flexible naming: key-space is flat 3 things: - find the locations of keys - what happens when new node joins - what happens when node leaves they system What good about Chord compared to previous system? - more scalable (cost of lookup is O(logN)) - a node does NOT have to know about every other node - hence, each node has a small amount of "routing" information How its work? - m-bit identifier: each node and key is assigned an m-bit identifier - identifiers are ordered in an circle modulo pow(2,m) - successor(k): the first node clock wise from key k (Figure 2) **CHORD: a scalable lookup service for Internet Application** ============================================================= # Chord feature: - just look up service: key --> node contains that key - use consistent hashing: + load balancing + little movement of key when node joins/leaves + scalability: each node maintain routing info about O(logN) other nodes (vs. all of them) > look up: O(logN) messages > when nodes join/leaves: O(logN * logN) message for update # Data structure - for correctness of consistent hashing, each node maintains correct pointer of its successor ==> Problem: look up may need to traverse N nodes, since need to go around the circle using successor pointers Chord's solution: maintain additional hints: the *finger table* - ith entry in the table point to node successor(n + power (2, i-1)) Note: this finger table is for fast lookup, but not for correctness # What happens when: Look up operation < see my notes>, but general idea is: Say node n look for successor of k. 1) n looks in its finger table, if found, return NOTE: what to look for: if k is in range (n, n.successor) 2) if not found, find the *closest preceding finger to k* on n's finger table i.e node n' in the finger table i.e closest to k n' in the range of (n, k) 3) if k is not in range (n', n'.successor), repeat step 2. Complexity: at most O(logN) nodes to be contact # What happens when: a single node join - Invariants to guarantee the ability to locate every key in network: + each node's successor is correctly maintain + for every k, node successor(k) is responsible for k - Hints (i.e finger table) needs to be correct, too, for fast lookup - add *predecessor pointer* to each node + can be used to work counterclockwise around the identifier circle - What need to be done when node n joins the network: (1) initialize the predecessor and finger table of node n ==> ask an existing node (n') to lookup necessary info i.e n' will lookup n.finger[i].node generally n.finger[i].node = n'.find_successor(n.finger[i].start); Assumption: n learns about n' via external mechanism Complexity: O(m * logN) Some optimization reduces time to O(logN * logN) (2) update other nodes' data structure (fingers and predecessors) ==> node n need to be entered in finger tables of some existing nodes How: node n will become the ith entry of node p if and only if: ~ p precedes n by at least power (2, i-1) ~ ith finger of p succeeds n (3) notify higher layer software to transfer keys which n is now responsible for ==> n just need to contact node that immediately follows n for the key n is responsible for # What happens when: concurrent joins - invariants are difficult to maintain - hence, separate correctness and performance goals (how?) + each node now have the stabilization operation that run periodically to resolve successor pointer (in case of a newly joined node) + as long as pointers is correct, the find_successor (and find_predecessor) works - stabilize algorithm: refer to Figure 7. - E.g: " suppose node N joins the system, and its ID lies between nodes Ns and Np. N would acquire Ns as its successor. Node Ns, when notified by N, would acquire N as its predecessor. When Np next runs stabilize, it will ask Ns for its predecessor (which is now N); Np would then acquire N as its successor. Finally, Np will notify N, and N will acquire Np as its predecessor. At this point, all predecessor and successor pointers are correct." - a lookup that occurs before stabilization has finished can exhibit one of three behaviors + all the finger table entries involved in the lookup are reasonably current, and the lookup finds the correct successor + The second case is where successor pointers are correct, but fingers are inaccurate. This yields correct lookups, but they may be slower + the nodes in the affected region have incorrect successor pointers, or keys may not yet have migrated to newly joined nodes, and the lookup may fail. The higher-layer software using Chord will notice that the desired data was not found, and has the option of retrying the lookup after a pause. This pause can be short, since stabilization fixes successor pointers quickly. - What about the finger table, do we need to update them? + Ideally, we should change, for fast look up + but it turns out that joins do not substantially affect performance of finger table (why?) + refresh finger table periodically # What happens when: failures - when a node n fails, nodes whose finger tables contain n must find n'successor - need to maintain correctness of successor pointer - Solution: + each node maintain not just one successor, but a *successor list*, containing r nearest successors + if a node find its successor failed, it replaces it with the first live successor in the list + stabilization will correct finger table and successor list (later on)