The LOCUS Distributed Operating System

B. Walker, G. Propek, R. English, C. Kline, and G. Thiel @ UCLA

Proceedings of the Ninth ACM Symposium on Operating Systems Principles, October 10-13, 1983, pages. 49-70

Main Point

Centralized match maker (sync. site) to match using-site with storage-site having the latest version

Network partition and merge

Distributed File System

Characteristics

Uniform name space

Network transparency

Location transparency

Location independence

High availability by replication

Cache consistency guaranteed

Architecture

Composed of multiple file-group

File-group = file system in Unix

File-group are glued together by mount

Starting from the root file system, single name space is recognized

File Replication

Observation

Higher the directory is, accessed more frequently and modified less often

High degree of replication is desired

Lower the directory is, accessed less frequently and modified more often

Low degree of replication is desired

Architecture of replication

Mapping from a single file-group to multiple container

Means that a file-group is replicated in those multiple containers

However replication is done at the granularity of directories

Replications of a file-group are not mirror of the group

A file F in file-group X can be replicated to any of X's containers

File has a unique inode number within the file-group

Replications of a file have the same inode number

Hence files are uniquely identified by <file-group id, inode number>

Files have also version number assigned so that the lastly updated replication may be accessed

File Access - transparency

Synchronization site - stateful, meta-data server
knows which machine has which file
knows which machine has the most recent version of the file
has inode for every file of the group
A synchronization size per file-group per network partition!
Every open request goes to the synchronization site
Each machine has (file-group, sync site) mapping in the mount table
Reading files
Open file
Using site contacts sync. site
Sync. site contacts potential storage sites with the latest version number
One of storage site having the file with the version number responses
Sync. site brings inode in memory and fill it with information from the storage site
Sync. site responses to the using site with the info about storage site
Read file
Using site contacts the storage site directly without intervention of sync. site and gets the blocks as required
Close file
Using site -> Storage site
Storage site -> Sync. site
Sync. site -> Storage site
Storage site -> Using site
Pathname lookup
Pathname lookup is done at using site as AFS
Get directory pages as needed from storage site
Modification
Bring the page from storage site
Updates are propagated to storage site
Commit - also abort call is available
Updates until commit call are temporary
Shadow pages mechanism used
As a result using site reads the most recently committed version
Upon commit, storage site informs sync. site and other storage sites of the file
Storage sites having old version copy the modified pages
File creation and deletion
Upon creation two questions arise
How many copies should be made
Where to put these copies
Candidates are storage sites having the parent directory
Creation/deletion is done at the storage site and propagated to other storage sites

Machine and site dependent files

Same name must refer different files in case of network of heterogeneous machines

User context, including which machine she is working on, is maintained

Versions for each context are put into the same directory

Binding to the specific version is done automatically

Special type of files

pipe, device, ipc channel have file semantics as Unix

ipc supports for intra- and inter-machine with the same semantics of Unix

Recovery from partition

Absolute consistency within partition. Inconsistency between partition is allowed and is reconciled

Two view of conflict update

Updates to an object are self-contained so independent to those to other objects

Updates are part of transaction. So merge of conflicting updates to an object should be considered together with those to other objects

Merge of conflicting updates

by kernel

files that kernel manages are merged by kernel

for example directory, mailbox

by higher level software

file type specific merger is called

by user

notified by mail

Dynamic Reconfiguration

Partitioning

When a node failure or communication failure occurs,

nodes in the partition are sub-partitioned

nodes in a single sub-partition become consistent each other

Figuring out the exact boundary of new partition is very expensive and difficult task!

Election and agreement strategy

Merging

When the failure goes away,

sub-partitions get merged into a bigger partition

nodes in the new partition become consistent

Even more expensive and difficult task!

Election and agreement strategy again

Polling to nodes which are still dead takes more time

Process Management

Processes can be transparently created and executed either in local or remote machine

Creation

Where to create is determined dynamically

site information is in process environment

Difficulty is how to achieve Unix semantics

environment setup

share of open file between parent and child

Inter-process functions

Explicit interaction such as ipc is relatively easy

Hard part is implicit interaction

What if two processes running in different machines share an open file

Token mechanism: only token holder can do operation