The LOCUS Distributed Operating System

B. Walker, G. Propek, R. English, C. Kline, and G. Thiel @ UCLA

Proceedings of the Ninth ACM Symposium on Operating Systems Principles, October 10-13, 1983, pages. 49-70

Main Point

	Centralized match maker (sync. site) to match using-site with storage-site having the latest version
	Network partition and merge

Distributed File System

Characteristics

	Uniform name space
	Network transparency
	Location transparency
	Location independence
	High availability by replication
	Cache consistency guaranteed

Architecture

Composed of multiple file-group

File-group = file system in Unix

File-group are glued together by mount

Starting from the root file system, single name space is recognized

File Replication

Observation

Higher the directory is, accessed more frequently and modified less often

High degree of replication is desired

Lower the directory is, accessed less frequently and modified more often

Low degree of replication is desired

Architecture of replication

Mapping from a single file-group to multiple container

Means that a file-group is replicated in those multiple containers

However replication is done at the granularity of directories

Replications of a file-group are not mirror of the group

A file F in file-group X can be replicated to any of X's containers

File has a unique inode number within the file-group

Replications of a file have the same inode number

Hence files are uniquely identified by <file-group id, inode number>

Files have also version number assigned so that the lastly updated replication may be accessed

File Access - transparency

Synchronization site - stateful, meta-data server

	knows which machine has which file
	knows which machine has the most recent version of the file
	has inode for every file of the group
	A synchronization size per file-group per network partition!

Every open request goes to the synchronization site

Each machine has (file-group, sync site) mapping in the mount table

Reading files

Open file

	Using site contacts sync. site
	Sync. site contacts potential storage sites with the latest version number
	One of storage site having the file with the version number responses
	Sync. site brings inode in memory and fill it with information from the storage site
	Sync. site responses to the using site with the info about storage site

Read file

Using site contacts the storage site directly without intervention of sync. site and gets the blocks as required

Close file

Using site -> Storage site

	Storage site -> Sync. site
	Sync. site -> Storage site

Storage site -> Using site

Pathname lookup

Pathname lookup is done at using site as AFS

Get directory pages as needed from storage site

Modification

	Bring the page from storage site
	Updates are propagated to storage site

Commit - also abort call is available

Updates until commit call are temporary

Shadow pages mechanism used

As a result using site reads the most recently committed version

Upon commit, storage site informs sync. site and other storage sites of the file

Storage sites having old version copy the modified pages

File creation and deletion

Upon creation two questions arise

How many copies should be made

Where to put these copies

Candidates are storage sites having the parent directory

Creation/deletion is done at the storage site and propagated to other storage sites

Machine and site dependent files

	Same name must refer different files in case of network of heterogeneous machines
	User context, including which machine she is working on, is maintained
	Versions for each context are put into the same directory
	Binding to the specific version is done automatically

Special type of files

	pipe, device, ipc channel have file semantics as Unix
	ipc supports for intra- and inter-machine with the same semantics of Unix

Recovery from partition

Absolute consistency within partition. Inconsistency between partition is allowed and is reconciled

Two view of conflict update

Updates to an object are self-contained so independent to those to other objects

Updates are part of transaction. So merge of conflicting updates to an object should be considered together with those to other objects

Merge of conflicting updates

by kernel

files that kernel manages are merged by kernel

for example directory, mailbox

by higher level software

file type specific merger is called

by user

notified by mail

Dynamic Reconfiguration

Partitioning

When a node failure or communication failure occurs,

nodes in the partition are sub-partitioned

nodes in a single sub-partition become consistent each other

Figuring out the exact boundary of new partition is very expensive and difficult task!

Election and agreement strategy

Merging

When the failure goes away,

sub-partitions get merged into a bigger partition

nodes in the new partition become consistent

Even more expensive and difficult task!

Election and agreement strategy again

Polling to nodes which are still dead takes more time

Process Management

Processes can be transparently created and executed either in local or remote machine

Creation

Where to create is determined dynamically

site information is in process environment

Difficulty is how to achieve Unix semantics

environment setup

share of open file between parent and child

Inter-process functions

Explicit interaction such as ipc is relatively easy

Hard part is implicit interaction

What if two processes running in different machines share an open file

Token mechanism: only token holder can do operation