Back to index
Dynamo: amazon's highly available key-value store
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels
Amazon
One-line Summary
Dynamo is a distributed data-replicated key-value store achieving high availability for write, eventually consistency, and relies on the client to resolve conflicts detected by vector clocks.
Amazon CTO's blog
Overview/Main Points
- Background
- Needs
- high availablity for write
- scalability and latency are important
- consistency is the least important
- Techniques
- Consistent hashing
- (Sloppy) quorum
- Timestamp based inconsistency detection & resolution
- Gossip-based membership protocol
- Bayou-based Melke-tree inconsistency resolution
- Architecture Design
- Partition: consistent hashing with virtual nodes
- Heterogenity: different servers have different capability
- Better load balance
- High Availability for writes: vector clocks with reconciliation during reads
- Data replication: N+ replicas
- Handling temporary failures: sloppy quorum and hinted handoff
- get(key) could return more than one due to data with causally unrelated timestamp (vector clocks)
- put(key, value, context) where context includes new timestamp for this write
- Configurable R/W/N
- Recovery from permanent failures: Anti-entropy using Merkle trees
- Anti-entropy protocol to keep replicas synchronized
- Merkle tree algorithm for quick comparison
- Membership and failure detection: Gossip-based protocol
- Membership changes are done by explicit manual commands
- Every node is eventually aware of the change with the help of the seed nodes that are fully functional in the Dynamo ring.
- After a new member creates, cached data may be moved
- failure detection by gossip messages
-
Relevance
Flaws