Back to index
Coda: A Highly Available File System for a Distributed Workstation Environment
Mahadev Satyanarayanan, James J. Kistler, Puneet Kumar, Maria E. Okasaki, Ellen H. Siegel, and David C. Steere.
CMU
Summary by: Zuyu Zhang
One-line Summary
Coda distributed file system provides high avaiability via data replicas and disconnected operations, while detecting conflicts by version vectors and last change timestamp associated with the client name.
Overview/Main Points
- Background
- Descendant of AFS
- Scalability, as good as AFS
- Availability, better than AFS, due to server replication and disconnected operation (work solely on cached data)
- Interrupt file open/close operations in Linux vnode layer.
- Venus, on behalf of the client, contacts the server only on a cache miss on open, or on a close after changes.
- A callback mechanism ensures a server will notice the client about the concurrent changes to its cached files by other clients.
- Client dynamically determines the location of servers and files on the server, and caches such information.
- Server replication
- Replicate on a volume basis
- Volume, a set of files and directories on one server
- Volume Storage Group (VSG) & Accessible VSG (AVSG, including a preferred server with the latest copy)
- Each file and directory has a unique low-level file identifier (FID), including the parent volume info
- volume replication database in each server
- Volume Coda Version Vector (CVV)
- read-one, write-all approach
- Latest StoreID (LSID)
- ‹ client name, version number›
- ‹ client IP, logical timestamp ›
- Coda Version Vector (CVV)
- A vector timestamp
- detect w-w conflicts on the same files.
- # of elements: the number of servers
- CCV(i): the number of updates conducted by the server i
- Relations among two replicas by comparing CVVs and LSID
- equal
- submissive/dominates
- inconsistent
File Operations
- Open a file
- if AVSG is empty
- if the file is cached (then enables disconnected operations), success
- Otherwise, failed
- if AVSG is non-empty,
- if the cached file is up-to-date (by periodically ask for CVV at the preferred server), success
- if the client loses the callback to the cached file (due to network failure) in a preset time, success.
- if the cached file conflicts within volumes, failed
- if not cached
- transfer the whole file from preferred server in AVSG
- read CVVs from every server in AVSG
- if a non-preferred server dominates,
- read the file from the server
- set the server to be preferred
- register a callback with preferred server.
- Read / write the file: purely local operations
- Close the file: send the file back
- If no AVSG is available, close successfully and propagate updates when possible
- Otherwise,
- Contact all servers in AVSG
- Client sends file and CVV
- Server sends ACK for updates and CVV
- Client sends the final CVV
Consistency guarantees across partitions (survey)(database)
- Pessimistic replication strategy
- strict consistency
- restrict modifications to at most one partition
- Optimistic strategy
- high availability
- allow updates in every partition, but detect and resolve conflicting updates after their occurence
- write sharing between users is rare in an academic environment.
- For concurrent/conflicting updates, Coda's last-writer-wins policy provides high availability.
Disconnected operations
- States
- Hoarding (connected)
- Update the cache by LRU & priority-list (the hightest level is sticky)
- Emulation (disconnected)
- Keep an update log
- Make the log as persistent as possible
- Reintegration (re-connected)
- Replay the log, transaction by transaction
Conflict resolution
- File: nothing is automated; all left for users
- Directory
- manually resolve conflicts in partitioned replicas of a directory
- update/update conflicts: protection changes on the directory
- remove/update conflicts on an object
- name/name conflicts: creation of a new object with identical names within the directory
- All others could be done automatically using a list of (name, FID) with two operations (create/delete)
Relevance
Flaws