Disconnected Operation in the Coda File System

J. J. Kistler and M. Satyanarayanan @ CMU

Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, December 3-6, 1989, pages 213-225

 

Introduction

Challenge and Idea

How to enjoy the benefits of a shared data repository, but be able to continue critical work when the repository is not accessible

Availability by replication and disconnected operation with cache

Availability Vs. consistency

Code file system architecture

As the successor of AFS, architecture is (almost) same as AFS

Optimistic vs pessimistic replica control

Pessimistic

Only one partition is allowed to access files during network partition

Shared lock--multiple readers and no writer--is possible

Difficult to handle involuntary disconnection

Errant client can block other client indefinitely

Timeout mechanism may be helpful

Optimistic

All partitions can access files

Conflicts are detected and resolved later

The chance of conflicts is low because of the low degree of write-share of Unix

CMU guys chose this option for high availability

Optimistic replica control for disconnected operation. Then how about replica among servers?

Optimistic too

With pessimistic control, update made during disconnection may not be applied because its server is disconnected!

Implementation

Client structure - Fig 2

State transition of client - Fig 3

Hoarding state

Connected to the server

Cache file both for performance and possible disconnected-operation

Prioritized and hierarchical cache management

Priority of file is determined by

user's input about the importance of the file

recent reference pattern - LRU

Directory is not kicked out until all the files and subdirectories under it are kicked out

Hoard walking

Priority of file is dynamic

user's input is static, but

priority determined by reference pattern changes dynamically and over time

-> Priority inversion could happen

Files with higher priority are out of cache

Hoard walking

makes the cache reflect priority by kicking out files with lower priority and bringing in those with higher priority

brings latest version of file, excluding directory, which has been modified in another site

Usually callback is handled, i.e., new version is brought into, only when the file is referenced again. So this action of bringing new version will occur only when a callback was delivered to this client but no reference to the file has been made until the time of hoard walking

Emulation state - disconnected operation

Client acts as a pseudo-server

Security checks are postponed until reintegration

Temporary file-id is generated for newly created file

Log is recorded for later reintegration

Log per volume

Each log entry has

system call & its arguments

version number of target object

Log should be short. Client should be very frugal to use cache

Reintegration state

Changes made during emulation state are propagated to servers

Client cache is updated with latest versions from servers

Algorithm

Client

gets permanent fids and replaces temporary fids in the log

sends the log in parallel to servers having the volume

Each server in parallel,

parses the log file and locks objects referenced in it

validate & execute operations

brings data blocks of modified files from client

commit

How servers detect conflicting modification?

Each object is tagged (i.e., versioned) with an id

For update operation of a file t

if ( id(t) in log == id(t) in server ) valid -> apply the modification of log to server

otherwise, error

For directory modification, entry level check is done if id's mismatches