** The Andrew File System (AFS) **

The Andrew File System was introduced by researchers at Carnegie-Mellon
University (CMU) in the 1980's [1]. Led by the well-known Professor
M. Satyanarayanan of Carnegie-Mellon University ("Satya" for short), the main
goal of this project was simple: *scale*. Specifically, how can one design a
distributed file system such that a server can support as many clients as
possible?

Interestingly, as we will see, there are numerous design and implementation
components that affect scalability. Most important is the design of the
*protocol* between clients and servers. In NFS, for example, the protocol
forces clients to check with the server periodically to determine if cached
contents have changed; because each check uses server resources (e.g., CPU,
network bandwidth, etc.), frequent checks like this will limit the number of
clients a server can respond to and thus limit scalability. 

[AFS VERSION 1]

We will discuss two versions of AFS [1,2]. The first version (which we will
call AFSv1, but actually the original system was called the ITC distributed
file system [2]) had some of the basic design in place, but didn't scale as
desired, which led to a re-design and the final protocol (which we will call
AFSv2, or just AFS) [1].  We now discuss the first version.

One of the basic tenets of all versions of AFS is *whole-file caching* on the
*local disk* of the client machine that is accessing a file. When you open() a
file, the entire file (if it exists) is fetched from the server and stored in
a file on your local disk. Subsequent application read() and write()
operations are redirected to the local file system where the file is stored;
thus, these operations require no network communication and are fast. Finally,
upon close(), the file (if it has been modified) is flushed back to the
server. Note the obvious contrasts with NFS, which caches *blocks* (not whole
files, although NFS could of course cache every block of an entire file) and
does so in client *memory* (not local disk).

Let's get into the details a bit more. When a client application first calls
open(), the AFS client-side code (which the AFS designers call *Venus*) would
send a Fetch protocol message to the server. The Fetch message would pass the
entire pathname (e.g., "/home/remzi/notes.txt") of the desired file to the
file server (the group of which they called *Vice*), which would then traverse
the pathname, find the desired file, and ship the entire file back to the
client. The client-side code would then cache the file on the local disk of
the client (by writing it to local disk). As we said above, subsequent read()
and write() system calls are strictly *local* in AFS (no communication with
the server occurs); they are just redirected to the local copy of the
file. Because the read() and write() calls act just like calls to a local file
system, once a block is accessed, it also may be cached in client
memory. Thus, AFS also uses client memory to cache copies of blocks that it
has in its local disk. Finally, when finished, the AFS client checks if the
file has been modified (i.e., that it has been opened for writing); if so, it
flushes the new version back to the server with a Store protocol message,
sending the entire file and pathname to the server for permanent storage.

--------------------------------------------------------------------------------
TestAuth      Test whether a file has changed (used to validate cached entries)
GetFileStat   Get the stat info for a file
Fetch         Fetch the contents of an entire file from the server
Store         Store this file on the server
SetFileStat   Set the stat info for a file
ListDir       List the contents of a directory
--------------------------------------------------------------------------------
                  [FIGURE: AFSv1 PROTOCOL HIGHLIGHTS]

The next time the file is accessed, AFSv1 does so much more efficiently.
Specifically, the client-side code first contacts the server (using the
TestAuth protocol message) in order to determine whether the file has
changed. If not, the client would use the locally-cached copy, thus improving
performance by avoiding a network transfer. The figure above shows some of the
protocol messages in AFSv1. Note that this early version of the protocol only
cached file contents; directories, for example, were only kept at the server. 

[PROBLEMS WITH VERSION 1]

A few key problems with this first version of AFS motivated the designers to
rethink their file system. To study the problems in detail, the designers of
AFS spent a great deal of time measuring their existing prototype to find what
was wrong. Such experimentation is a good thing; *measurement* is the key to
understanding how systems work and how to improve them. Hard data helps take
intuition and make into a concrete science of deconstructing systems. In their
study, the authors found two main problems with AFSv1:

*Path-traversal costs are too high*: When performing a Fetch or Store, the
client passes the entire file name (e.g., "/home/remzi/grades.txt") to the
server. The server, in order to access the file, must perform a full pathname
traversal, first looking in the root directory to find "home", then in "home"
to find "remzi", and so forth, all the way down the path until finally the
desired file is located. With many clients accessing the server at once, the
designers of AFS found that the server was spending much of its time simply
walking down directory paths!

*The client issues too many TestAuths to the server*: Much like NFS and its
overabundance of GetAttr protocol messages, AFSv1 generated a large amount of
traffic to check whether a local file (or its stat information) was valid with
the TestAuth protocol message. Thus, servers spent a great deal of time
telling clients whether it was OK to used their cached copies of a file. Most
of the time, it was OK (of course), and thus the protocol was leading to high
server overheads again.

There were actually two other problems with AFSv1: load was imbalanced across
servers, and the server used a single distinct process per client thus
inducing context switching and other overheads. The load imbalance problem was
solved by introducing *volumes*, which an administrator could move across
servers to balance load; the context-switch problem was solved in AFSv2 by
building the server with threads instead of processes. However, for the sake
of space, we focus here on the main two protocol problems above that limited
the scale of the system.

[THE CRUX OF THE PROBLEM: MINIMIZING SERVER INTERACTIONS, AND MAKING THEM EFFICIENT]

Thw two problems above limited the scalability of AFS; the server CPU became
the bottleneck of the system, and each server could only service 20 clients
without becoming overloaded. Servers were receiving too many TestAuth
messages, and when they received Fetch or Store messages, were spending too
much time traversing the directory hierarchy. Thus, the AFS designers were
faced with a problem: how could they redesign the protocol to minimize the
number of server interactions, i.e., how could they reduce the number of
TestAuth messages? Further, how could they design the protocol to make these
server interactions efficient? By attacking both of these issues, a new
protocol would result in a much more scalable version AFS.

[AFS VERSION 2]

AFSv2 introduced the notion of a *callback* to reduce the number of
client/server interactions. A callback is simply a promise from the server to
the client that the server will inform the client when a file that the client
is caching has been modified. By adding this *state* to the server, the client
no longer needs to contact the server to find out if a cached file is still
valid; rather, it assumes that the file is valid until the server tells it
otherwise.

AFSv2 also introduced the notion of a *file handle* (very similar to NFS)
instead of pathnames to specify which file a client was interested in. A file
handle in AFS consisted of a volume identifier, a file identifier, and a
generation number. Thus, instead of sending whole pathnames to the server and
letting the server walk the pathname to find the desired file, the client
would walk the pathname, one piece at a time, caching the results and thus
hopefully reducing the load on the server.

For example, if the client wished to access the file "/home/remzi/notes.txt",
and home was the AFS directory mounted onto "/" (in other words, "/" was the
local root directory, but "home" and its children were in AFS), the client
would first Fetch the directory contents of "home", put them in the local-disk
cache, and setup a callback on "home". Then, the client would Fetch the
directory "remzi", put it in the local-disk cache, and setup a callback on the
server on "remzi". Finally, the client would Fetch "notes.txt", cache this
regular file in the local disk, setup a callback, and finally return a file
descriptor to the calling application.

The key difference, however, from NFS, is that with each fetch of a directory
or file, the AFS client would establish a callback with the server, thus
ensuring that the server would notify the client of a change in its cached
state. The benefit is obvious: although the first access to
"/home/remzi/notes.txt" generates many client-server messages (as described
above), it also establishes callbacks for all the directories as well as the
file notes.txt, and thus subsequent accesses are entirely local and require no
server interaction at all. Thus, in the common case where a file is cached at
the client, AFS behaves nearly identically to a local disk-based file
system. If one accesses a file more than once, the second access should be
just as fast as accessing a file locally.

[CACHE CONSISTENCY]

Because of callbacks and whole-file caching, the cache consistency model
provided by AFS is easy to describe and understand. When a client (C1) opens a
file, it will fetch it from the server. Any updates it makes to the file are
entirely local, and thus only visible to other applications on that same
client (C1); if an application on another client (C2) opens the file at this
point, it will just get the version that is stored at the server which does
not yet reflect the changes being made at C1. When the application at C1
finishes updating the file, it calls close() which flushes the entire file to
the server. At that point, any clients caching the file (such as C2) would be
informed that their callbacks are broken and thus they should not use cached
versions of the file because the server has a newer version.

In the rare case that two clients are modifying a file at the same time, AFS
naturally employs what is known as a *last writer wins* approach. Specifically, 
whichever client calls close() last will update the entire file on the server
last and thus will be the winning file, i.e., the file that remains on the
server for others to see. The result is a file that is either one client's or
the other client's. Note the difference from a block-based protocol like NFS:
in such a block-based protocol, writes of individual blocks may be flushed out
to the server as each client is updating the file, and thus the final file on
the server could end up as a mix of updates from both clients; in many cases,
such a mixed file output would not make much sense (i.e., imagine a JPEG image
getting modified by two clients in pieces; the resulting mix of writes would
hardly make much sense).

[CRASH RECOVERY]

From the description above, you might sense that crash recovery is more
involved than with NFS. You would be right. For example, imagine there is a
short period of time where a server (S) is not able to contact a client (C1),
for example, while the client C1 is rebooting. While C1 is not available, S
may have tried to send it one or more callback recall messages; for example,
imagine C1 had file F cached on its local disk, and then C2 (another client)
updated F, thus causing S to send messages to all clients caching the file to
remove it from their local caches. Because C1 may miss those critical messages
when it is rebooting, upon rejoining the system, C1 should treat all of its
cache contents as suspect. Thus, upon the next access to file F, C1 should
first ask the server (with a TestAuth protocol message) whether its cached
copy of file F is still valid; if so, C1 can use it; if not, C1 should fetch
the newer version from the server. 

Server recovery after a crash is more complicated. The problem that arises is
that callbacks are kept in-memory; thus, when a server reboots, it has no idea
which client machine has which files. Thus, upon server restart, each client
of the server must realize that the server has crashed and treat all of their
cache contents as suspect, and (as above) reestablish the validity of a file
before using it. Thus, a server crash is a big event, as one must ensure that
each client is aware of the crash in a timely manner, or risk a client
accessing a state file. There are many ways to implement such recovery; for
example, by having the server send a message (saying "don't trust your cache
contents!") to each client when it is up and running again. As you can see,
there is a cost to building a more scalable and sensible caching model; with
NFS, clients hardly noticed a server crash.

[SCALE OF AFSv2]

With the new protocol in place, AFSv2 was measured and found to be much more
scalable that the original version. Indeed, each server could support about 50
clients (instead of just 20). A further benefit was that client-side
performance often came quite close to local performance, because in the common
case, all file accesses were local; file reads usually went to the local disk
cache (and potentially, local memory). Only when a client created a new file
or wrote to an existing one was there need to send a Store message to the
server and thus update the file with new contents. 

[OTHER THINGS: NAMESPACE, SECURITY, ETC.]

AFS added a number of other improvements beyond scale. It provided a true
global namespace to clients, thus ensuring that all files were named the same
way on all client machines; NFS, in contrast, allowed each client to mount NFS
servers in any way that they pleased, and thus only by convention (and great
administrative effort) would files be named similarly across clients.  

AFS also took security seriously, and incorporated mechanisms to authenticate
users and ensure that a set of files could be kept private if a user so
desired. NFS, in contrast, still has quite primitive support for security. 

Finally, AFS also included facilities for flexible user-managed access
control. Thus, when using AFS, a user has a great deal of control over who
exactly can access which files. NFS, like most UNIX file systems, has much
more primitive support for this type of sharing.

[SUMMARY]

AFS shows us how distributed file systems can be built quite differently than
what we saw with NFS. The protocol design of AFS is particularly important; by
minimizing server interactions (through whole-file caching and callbacks),
each server can support many clients and thus reduce the number of servers
needed to manage a particular site. Many other features, including the single
namespace, security, and access-control lists, make AFS quite nice to
use. Finally, the consistency model provided by AFS is simple to understand
and reason about, and does not lead to the occasional weird behavior as one
sometimes observes in NFS. 

Perhaps unfortunately, AFS is likely on the decline. Because NFS became an
open standard, many different vendors supported it, and, along with CIFS (the
Windows-based distributed file system protocol), NFS dominates the
marketplace. Although one still sees AFS installations from time to time (such
as in various educational institutions, including Wisconsin), the only lasting
influence will likely be from the ideas of AFS rather than the actual system
itself. Indeed, NFSv4 now adds server state (e.g., an "open" protocol
message), and thus bears more similarity to AFS than it used to.

[REFERENCES]

[1] "Scale and Performance in a Distributed File System",
John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols,
M. Satyanarayanan, Robert N. Sidebotham, Michael J. West.
ACM Transactions on Computing Systems (ACM TOCS),
page 51-81, Volume 6, Number 1, February 1988.

[2] "The ITC Distributed File System: Principles and Design",
M. Satyanarayanan, J.H. Howard, D.A. Nichols, R.N. Sidebotham,
A. Spector, M.J. West.
SOSP '85. pages 35-50.