CS 736 Reviews - Spring 2016: Scale and Performance in a Distributed File System

« Optimistic Crash Consistency | Main | Using Encryption for Authentication in Large Networks of Computers. »

Scale and Performance in a Distributed File System

Scale and Performance in a Distributed File System. John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Stayanayanan, Robert N. Sidebotham, and Michael J. West. ACM Trans. on Computer Systems 6(1), February 1988, pp. 51-81.

Reviews due Thursday, 4/7.

Posted by Michael Swift on April 5, 2016 08:51 AM | Permalink

Comments

Summary : In a distributed computing environment, it was evident that large scale affected performance and complicated system operation. The authors propose a location-transparent distributed file system to overcome the above limitations. They initially implement a prototype Andrew File System(AFS) and use a synthetic benchmark to evaluate it. They then arrived at design choices in the areas of cache validation, server process structure, name translation and low-level storage representation. They also evaluate NFS, a contemporary distributed file system using the same benchmark to quantify and compare with the merits of AFS. The authors then show how the aggregation of files into volumes enhanced the operability of the system by their proposed design changes. Lastly they discuss improvements to issues peripherally related to scalability.

Problem : The initial goal of AFS was to be a distributed files system, scalable upto 5000 workstations. However the first version of the prototype failed to achieve these goals due to performance degradation and operability issues. The problems can be enumerated as follows; a] Cache validation on each file open and close increased server traffic between workstations(Venus) and server(Vice). b] Increase in virtual memory paging and overhead due to context switches owing to the fact that server had a dedicated process for each client and complete path traversals by Vice. This led to an increased load on server even with few workstations, where the CPU server became the bottleneck. c] Files could not be moved between servers thereby rendering load balancing, disk space management, file replication and backup across servers difficult. These problems motivated the design goals for the next improved version of AFS in the areas of cache management, name resolution, communication and server process structure, and low-level storage representation.

Contributions : The proposed solutions to the above issues were: reduce callbacks - number of cache validitation checkins, client/workstation to do pathname traversal instead of server and balance server usage by reassigning users and reducing number of server process. Contributions made in this directions were:
a] Cache management - server tells client if cache is valid,this reduces number of cache validation requests, on client reboot all cache files and directories need to be validated, tradeoff for this was the complexity in the need to maintain callback state information{server has to notify clients having callbacks on that file before an update to file, performance degraded if callback state was large and probable inconsistency in callbacks may result in servers going out of sync with respect to each other’s states}. This resulted in AFS trading complexity for scale unlike NFS which compromises on simplicity for quick recovery.
b] Name resolution - In the first version of AFS(AFSv1), only Venus was aware of pathnames leading Vice to look up mapping, in the improvised version of AFS(AFSv2) each file/directory is identified by a unique fid, where Venus is aware of the fid, and uses it to interact with Vice, Venus looks up mapping to get fid, given a pathname, Volume location database is replicated at each server, this resulted Venus needing to look up once (compared to Vice having to lookup every time).
c] Communication and Server Process Structure - AFSv1 had the drawback of overhead due to excessive context switching owing to multithreaded servers and used RPC as communication mechanism. AFSv2 to overcome the above limitations adopted exact-one semantic(user-level lightweight processes) in case of no failure, whole-file transfer protocol with an optimized bulk transfer protocol and authenticated communication.
d] Low-level Storage Optimization : In AFSv1 files were no longer accessed by inodes but they were accessed by pathnames. AFSv2 went back to a 2-level mechanism similar to inodes -> Client provided fid, which was used to lookup vnode information, followed by obtaining the inode number that was used in the call to read/write data.
e] Snapshotting was used for file backup and, movement of volumes and read-only replication.

Evaluation : The multiple iterations of evaluation the authors have carried out in this paper is highly commendable. Initially they carry out extensive profiling, measurements using a synthetic benchmark and analysis on AFSv1 to discover weaknesses that served as challenges to be addressed by the design choices in AFSv2. AFSv2 continued to have CPU server as bottleneck but now the Vice server was less loaded. Tabulated data for the distribution of calls handled by server show that GetTime, FetchStatus and RemoveCB are the most frequently occuring ones.

In the next iteration of evaluation, the authors compare AFSv2 with NFS, that was touted to lead among all distributed file systems. It was clear that NFS could not scale well. AFSv2 initially had less performance compared to NFS but it overtook at the point of 3 Load Units, beyond which performance of NFS continued to degrade at a rapid pace. Coming to the case of CPU utilization, AFSv2 was lightly loaded(42%) at a load of 18 load units which was clearly better than NFS that was totally saturated at this point. Even in the case of disk utilization, AFSv2 used noticeably less disk(33%) than NFS(greater than 95%). However when it came to latency during a file-open, NFS had lower file-open latencies compared to AFSv2. The authors attribute this to whole file transfer semantics which is dependent on file size.

Having carried out extensive evaluation, I feel it would have been better for the authors to evaluate using more than one benchmark, instead of only one synthetic benchmark that could deal with other aspects such as memory consumption(due to storage of Authentication and network databases) on scaling the number of clients.

Confusions : a] What is file tree structure ? b] Concept and granularity of volumes?

Posted by: Shruthi Racha | April 7, 2016 09:02 AM

1. Summary
In this paper the authors examine the design of the Andrew File System in detail. The authors specifically stress on scalability as the central theme of the paper as they describe its features and design decisions as they aim to scale to several thousands clients. Caching full files in local workstations, employing lightweight processes in servers and maintaining stub directories that enable all servers to map a file to its corresponding server are some of the key techniques the authors employed in the AFS to achieve scalability.

2.Problem
The existing distributed file systems (including the prototype AFS) couldn’t scale very well beyond 1000 clients. The authors measured and analyzed the prototype that revealed few issues hindering the overall scalability of the system. Some of the major issues were:
* Servers spent significant time on simply doing file-path lookup.
* One process per client lead to expensive context switches
* Loads were not balanced across servers
* Authentication and cached files' status messages accounted for a disproportionate share of server interaction.

3.Contributions
The contributions of this paper are in two folds:
First, the authors show how to identify and solve issues in a complex system by systematic reasoning, measurements of several components and analyzing them. The problems and solutions highlighted here has influenced several subsequent DFS designs.
Second, the authors employ different strategies to address the scalability issues:
* cache management: Server informs client if the client's cache is invalid using callbacks.
* name resolution: unique FID to identity each file/directory
* communication and server process structure:
* low-level storage representation: access files by inodes (given FID) rather by pathname

4.Evaluation
The authors approach the design and optimization (at least for the AFS-V2) after studying and measuring the system and thus adopt the “profile before optimization” principle and naturally make their case for their design decisions.
For the evaluation of the system, the authors present the comparison of the performance of AFS-V2 with the earlier version of the AFS as well as with the Sun Network File System (NFS). Using the synthetic benchmarks to measure various file system operations such as creation, copying, reading and compiling of files they show that AFS performs significantly better in terms of scalability. The AFS-v2 file system scales approximately eight times and the CPU and disk usage is better. The servers can support higher number of users - up to 50 as opposed to 20 in the earlier system. It can also handle higher workloads better than NFS.
AFS trades complexity for scalability, making the recovery for difficult. The authors do not provide relevant analysis and measurements in this area and how they compare to those of NFS. Also, the authors could have used other workloads than just synthetic workloads.

5.Confusion
* There are few consistency issues not addressed in the paper such as inconsistencies introduced by the ‘last writer wins’ issue and callbacks getting lost (and thus clients not knowing that their caches are stale).

Posted by: Udip | April 7, 2016 09:00 AM

1. Summary
The article analyzes the performance, scalability and operability of Andrew File System, backing all the design decisions with relevant workloads. The authors compare it with NFS and examine every aspect of the file systems.

2. Problem
The goal of this experiment was to make an easily scalable, crash consistent and secure distributed system network on the CMU campus. The systems run the 4.2BSD and comprise of Vice, the collection of servers and Venus, the user-level process that handles intercepted file system calls. The initial prototype had issues such as degradation of performance in workloads like recurive directory listing of large subtree of files, no upper bound cache validity checks, no quota of disk utilization for the users, dedicated process per client leading to context switching overheads and high virtual memory paging demands, exceeding network resources due to RPC byte-stream kernel abstraction implementation. To eradicate these issues, performance bottleneck at the server CPU has been drastically reduced in the final AFS implementation keeping the basic architecture intact and only realizing the design better.

3. Contributions
Four distinct areas have been worked on for improving the performace:
- cache management: LRU bounded, store file size and modification timestamp. Callbacks make it more feasible to resolve pathnames on workstations.
- Name Resolution: Fid comprising of volume number, vnode number and uniquifier help optimize retrieval of file from any location, instead of using complete path names whose traversal was quite expensive.
-Communication and server process structure: LWPs help serve more client requests and RPC(implemented outside of kernel) provides secure, authenticated optimized bulk transfer of whole files.
- Low-level storage representation: indexed fid table look-up for vnode information is cached and its faster than kernel look-ups.
The access to cache data structure is synchronized to support concurrency and locality of files make the subsequent access cheaper. The major take-away is the open/close mechnism: the file is updated on the server only when its opened or closed. Any subsequent reads/writes are only local and help reduce network traffic drastically. Volumes are a great way to enforce load balancing seamlessly in the distributed network.

4. Evaluation
The authors have done a thorough job on evaluating the performance and scalability of AFS with respect to number of load units with various workloads in a distributed environment. For each experiment, NFS has been compared with warm and cold cache AFS and emperically AFS is superior when it comes to disk utilization and CPU. NFS did not scale well beyond 18 load units, where each load unit translated to average of 5 users, since it reached 100% CPU usage. They even compared with the prototype design and observed 70% slower performance in it as compared to 19% slower of the present AFS implementation keeping stand-alone as the benchmark. The scalability also displayed remarkable improvement:- prototype:4 times as long at a load of 10 when compared to 1; current AFS: less than twice as long at load of 20 as at load 1 (only 36% more at load 10).
I specially like the grouping of files when removing callbacks that helps reduce RemoveCB frequency. It would have been more insightful if the AFS could be ported in kernel and some microbenchmark results could have backed their claim of making the present AFS more optimized.

5. Question
Perhaps if LWPs were flexible and dependent on number of clients, AFS may perform better? Also, despite scaling gracefully and handling large clients better, why is AFS not more widely used across enterprises.

Posted by: Sejal Chauhan | April 7, 2016 09:00 AM

1. Summary
This paper describes AFS, a distributed file-system that uses a set of trusted servers to present a homogeneous, location-transparent file name space to all its clients. The authors initially built a prototype system (AFSv1, ITC distributed file-system), throughly analysed its deficiencies and developed a synthetic benchmark. Next, the authors built solutions incrementally to address the performance penalty and improve administration that led to AFSv2. This system was evaluated against NFS.

2. Problem
AFS was designed for "scalability" from the ground up and this presents a multitude of challenges from protocol design to performance and administration. AFSv1 did not scale very well and it suffered from the following problems:
- Expensive path-traversal operations: Clients passed filename and server performed the traversal
- Too many TestAuth messages: To check validity of local-file
- Distinct dedicated process on server to handle clients: Expensive as it introduced costs due to context switching, etc...
- Load balancing across servers
Hence, there arose a need to redesign this system to improve scalability and operability.

3. Contributions
AFSv2 was built upon its predecessor AFSv1 and reused the notion of whole-file caching on local disk. AFSv2 introduced the notion of callbacks and FID (File Identifier) to improve the protocol. A callback is simply a promise from the server to the client that the server will inform the client when a file that the client is caching has been modified. An FID consists of a volume identifier, a file identifier, and a “uniquifier”. Adding callbacks reduced the number of TestAuth messages and FIDs ensured that clients traverse the file pathnames which reduced the load on the server. While cache-consistency is not a panacea, whole-file caching and callbacks simplified this problem and AFS employs the "last-writer wins" policy. There are two important cases to be considered:
- Consistency between processes on different machines: Upon file closure, AFS makes updates visible at the server and invalidates cache copies at the same time
- Consistency between processes on the same machine: Writes to a file are immediately visible to other local processes (typical UNIX semantics)
Instead of having a dedicated process per client on the server, AFSv2 redesigned the server with threads (Light Weight Processes) and LWPs were bound to a client only for the duration of a single server operation. To improve operability, AFSv2 uses Volumes which administrators could move across servers to balance load. The system also implemented quotas on a per-volume basis. The system includes facilities for flexible user-managed access control. AFS also takes security seriously, and incorporates mechanisms to authenticate users and ensure that a set of files could be kept private if a user so desired. Read-only replication at the volume granularity improved availability and balanced load.AFS addressed the backup and restoration problems by making a read-only clone which was then transferred asynchronously to the staging machine.

4. Evaluation
Measurements are the key to understanding how systems work and how to improve them. This paper has numerous instances where extensive details are provided. The authors assessed their initial prototype (AFSv1) by performing controlled experiments with a synthetic benchmark. They assessed the distribution of Vice Calls, Server Usage and computed the time per TestAuth Call under different Load units. The measurements reflected that performance improvements are possible by reducing the frequency of cache validity checks, reducing the number of server processes, requiring clients to perform pathname traversals and server usage balance by reassigning users. AFSv1 mesurements also indicated that the system had scope for scalability as even under 20 load-units the system was not saturated. CPU utilization is between 15-25% while disk and network utilizations are fairly low. After addressing the shortcomings of AFSv1 with AFSv2, the new system was evaluated against NFS. The benchmark results indicate the AFSv2 performs better than NFS under increasing load and starting at 5 Load Units, AFS(cold, warm) perform better. CPU utilization is 100% for NFS under 18 load-units, while AFSv2's cold and warm cpu-utilization rates are 41.5% and 37.7 % respectively. The latency with NFS is independent of file-size while with AFSv2; if file is present in cache, latency is similar to NFS else latency increases with file-size. Also NFS generates 3 times as much network traffic as AFSv2.
On the whole, the authors have performed a remarkable job in redesigning the system and making extensive measurements. The data amassed from the experiments clearly justify the design choices made both for performance and improved usability.

5. Confusion
Why isn't AFS as popularly used as NFS ? Is it because NFS became an open standard ??

Posted by: Vinothkumar Siddharth | April 7, 2016 08:59 AM

1. Summary
The paper talks about a location-transparent distributed file system called the Andrew File System. The authors present observations from from a prototype implementation of AFS and suggest design changes in the system like cache validation, server process structure, name translation and low-level storage representation that helps AFS’s ability to scale with the client loads.

2. Problem
The authors have performed experiments on the prototype implementation measuring measuring server call distribution, scaling of prototype bench performance and CPU & disk utilization of servers. Based on these measurements, the authors hypothesize that significant performance improvement will be possible if there was reduction in number of cache validity checks, reduction in number of server process switching overhead,balancing server use by reassigning files and letting workstations do pathname traversal rather than server.

3. Contributions
The revised AFS has the same fundamental architectural principle as prototype - Workstations cache entire files from a collection of dedicated autonomous servers. The changes in design for performance/scalability are done in 4 areas: Cache management, Name Resolution, Communication and server process structure and low level storage representation. The mechanism for ensuring cache entries consistent is modified and callbacks are used by server to notify the changes to a file,directory cached by workstations, thereby reducing the number of cache validation requests and load on servers. In the prototype each Vice pathname presented by Venus involved an expensive namei operation on server to locate file. This is addressed by introducing the notion of two-level names.Vice file.directory are identified by a unique Fid with Venus mapping pathnames to fids. Fids do not contain explicit location information. The use of a server process per client did not scale well on the prototype so it was redesigned using a single process to service clients by using Light-Weight Processes (Threads) to reduce switching. There was a change in the low-level storage representation by accessing files directly by inodes instead of pathnames. The vnode information in the Fid identifies the inode of the file storing its data.
The contribution of this paper in terms for operability of the system is the notion of Volume which is a collection of files forming a partial subtree of Vice namespace. The abstraction of volumes enabled ensuring Quotas, Movement/Cloning, Read-Only Replication and Backup.
4. Evaluation
The authors have performed a lot of experimentation to present problems with the prototype implementation, following which they come up with design changes for performance. They compare the scalability of redesigned AFS with the older prototype and also with an existing distributed filesystem NFS running a benchmark involving Makedir,Copy,ScanDir,ReadAll and Make on a large set of files. The redesigned AFS is only 19% slower to stand alone as compared to prototype which is 70% slower. There is better scalability with redesigned AFS with the relative benchmark time client load of 20 being 192%, while the prototype had 410% for load of 10. CPU utilization varies from 8% to 70& varying the client load from 1 to 20, while the disk utilization being less than 25% for redesigned AFS. Comparing with NFS,redesigned AFS has a lower CPU utilization for every load level, showing that AFS has low server overhead. NFS saturates to 100% utilization with high load. Benchmark time is faster for NFS for smaller client loads however it does not scale well with larger loads and AFS performs better for client load greater than 5. The network traffic for AFS is just one third of NFS. All these experiments illustrate better scalability using the new design, which was the problem being addressed.

5. Confusion
Could you explain the name resolution and low-level storage representation design aspects in detail?

Posted by: Anshul Purohit | April 7, 2016 08:56 AM

1. Summary
This paper describes the Andrew File System, a distributed file system developed at CMU with scalability as one of the system’s main goals. The authors understand the scalability and performance bottlenecks of AFS version 1 prototype through measurements and present their changes to the design and show how these changes improved the scalability and performance of the new AFS version 2.

2. Problem
Scale was one of the major goals of AFS. But AFSv1 didn’t scale as expected and had a number of performance bottlenecks. Through measurements the authors identified the following problems with AFSv1: AFS server spent significant CPU time in performing pathname traversal for its clients. Though whole file caching’s goal was to reduce client-server interaction, the AFS server ended up serving huge number of cache validity check requests from the clients. AFSv1 used distinct process per client and hence induced high context switching overheads and didn’t balance load well.

3. Contributions
One of the key design choices in AFS is the caching of the entire file in the client’s local disk where the entire file is fetched from server on file open() and pushed to server on close() if the file was modified. This reduces client-server interactions as read() and write() requests can be served locally. This is very different from other distributed file systems like NFS, which cache blocks in client’s memory. AFS provides a weak consistency model where writes from a client could be lost due to concurrent updates on open files by different clients as AFS follows the policy of last closer wins. But the file in the server will always end up in a consistent version from one of the clients and this consistency guarantee is by design as opposed to NFS where a file might end up with mixed writes from multiple clients.
One of key design changes to AFSv2 is the mechanism of callbacks on the hope that it will reduce the number of TestAuth messages. On file open a client establishes callback with the server with the promise that server will notify the client of any changes to the file at server. To reduce the high path traversal costs a concept of file identifier was introduced and the responsibility of path traversal was pushed to the clients. The client has to present the fid of parent directory and name of the file to get the fid of that file and fetch its contents on open(). To avoid high context switching overhead, AFSv2 built user level thread mechanism and used a thread per request policy rather than distinct process per client of AFSv1. AFSv2 introduced the concept of volumes and volumes can be reassigned among servers. AFSv2 also had some changes to make the system more usable like the concept of quotas and read only replication of certain popular files to improve availability and load balancing.

4. Evaluation
Apart from the contributions, one of the key things to take away from the paper is the system designing aspect. The authors performed a bunch of workloads to bring out the performance and scalability bottlenecks in version 1 and used the learning to improve aspects of the system and ran the same workloads to prove how the new version performs better and solves the problem with the previous version. Through measurements on AFSv1 the authors show that TestAuth requests were high ~ 60%, the system didn’t scale well and resulted in increase in request times as the number of clients increased. Although the individual disk utilizations were low, the CPU utilizations on server were relatively very high thus showing CPU bottlenecks.

By running the same measurements on the new AFSv2 after the design changes the authors show that it no longer has the above bottlenecks. The system scaled graciously with the increase in the number of clients in terms of request processing time, CPU and disk utilizations. The authors also show the system usage for a number of deployed AFS servers and show the call distribution from real world workloads.

Apart from comparing with AFSv1, the authors did a performance comparison with NFS and showed that AFSv2 scaled better than NFS with respect to benchmark time, CPU and disk utilizations. Measurements also show the advantages of whole file caching as AFSv2 shows the reduced number of client-server interactions than NFS. The authors also compare NFS and AFS for workloads that doesn’t work well with AFS like opening a huge file to read one byte. I believe AFS would even worse than NFS for similar workload like appending few bytes to a huge log file.

5. Confusion
Why is NFS more popular than AFS? Is it because of the open protocol nature of NFS or is it because of the limitations in system design for certain workloads like concurrent appends to a huge log file?

Posted by: Aishwarya Ganesan | April 7, 2016 08:49 AM

1. Summary:
The authors in the paper discuss AFS with scalability and performance as their primary goals. They've introduced certain novel features in the paper with respect to cache management, name translation and volumes. The prototype is deployed in CMU on over 5000 to 10000 nodes over which the analysis was done in the paper.

2. Problem:
Scalability and performance problems of a distributed FS are of particular focus in the paper. Changes are suggested in the AFS prototype in terms of cache management, name translation and low-level storage representation. These changes help AFS in achieving better performance and scalability.

3. Contributions:
Unlike the NFS, AFS demonstrated a new style of distributed FS. A major contribution of the paper is w.r.t. cache management. Validating local caches in client machines only on file open and close is based on the observation that most files in UNIX are read as whole. Therefore it substantially reduces the network traffic for each read and write, and helping AFS to scale well. Secondly, name translation is transparent to the server because it only knows of the fids and the client (Venus) does all pathname translation, frees the server from time-consuming namei calls. Thirdly, files are aggregated in storage volumes and usage of location database servers help in moving user files seamlessly from one server to another. All of these modifications in AFS provide better performance and scalability compared to prototype implementation.

4. Evaluation:
They run benchmark with synthetic workload and compare their filesystem to NFS. They also provide the statistics after running measuring the performance of the FileSystem in the real world over a prolonged period of time. The initial prototype had poor scalability and under server utilization. They measured running time, cpu utilization and disk utilization for each system (Prototype, Revised AFS) and NFS. Revised AFS scales well and meets the target of supporting 50 workstations per Vice server. It uses significantly less disk compared to NFS. The authors didn't showed any evaluation w.r.t to failures of client or server but overall the evaluation looks good.

5. Confusion:
I am not clear on the way concurrent changes to files are handled in AFS, seeking a clarification on it ? Also the context switch of user level process Venus is of high overhead (as an enquiry) was there any work done on this aspect ? Since the paper is quite old, today with the evolving caching techniques that uses SSDs this file system looks quite promising any recent updates on project being actively worked on ?

Posted by: Ankur Srivastava | April 7, 2016 08:47 AM

1 Summary AFS is a distributed file system intended to scale to thousands of nodes which remote access to files. Via aggressive client-side caching, use of server-side LWPs, unique identifiers which map directly to inodes, and a sever callback mechanism, the second iteration of AFS is able to reduce server load and network traffic, allowing its performace to scale better than NFS and the original AFS prototype.

3 Contributions The authors introduce AFS, a scalable distributed file system, whose design is centered around local caching of remote files; files are held locally, and written back on close. Prior to implementing a full version, they develop a prototype, and identify bottlenecks in server load, then optimize around those bottlenecks. To reduce redundant cache validation requests, they introduce callbacks; when a client caches a file, it registers a callback request. If a cached copy is invalidated, the server notifies the client. To reduce the load caused by filename look ups, the revised AFS introduces a unique identifier managed by the AFS protocol, which maps directly to inodes. Via a new call which allows direct inode lookup, the revised AFS servers avoid costly calls to namei. Also, their revision of AFS also reduces expensive server-side context switches by handling requests with lightweight processes (in this case an LWP is a user-space thread), rather than their original choice of per-client processes. The authors also modify AFS to cache stat metadata locally, drastically reducing the number of metadata requests to the server.

Evaluation I liked that the authors first validated their overall design assumptions with a prototype; rather than prematurely optimizing, they actually identify the real bottlenecks and fix those. They show that calls to the prototype are dominated by file metadata operations, rather than reads and writes, and that repeated cache validation requests caused unneeded server load. Moreover, the workload ceases to scale linearly past a certain volume. The authors also evaluate their final design, and show that server performance and CPU load scales linearly with workload intensity. Moreover, they show that AFS does not perform significantly worse than NFS on low-volume workloads, and that it scales better with workload volume.

2 Confusion The paper implies that files within the same subtree are generally located in the same server. Are the designers of AFS assuming a certain degree of flatness in the directory hierarchy.

Posted by: Michael Vaughn | April 7, 2016 08:40 AM

Summary:
This paper analyzes the prior Andrew File System implementation and its drawbacks and provides suggestive improvements to improve its performance and scalability. The analysis of the paper involves demonstration of the improvements using several benchmarks. The paper also talks about the design and implementation of the two versions of AFS and a comparison of the AFS against NFS and other DFS.

Problem:
In the earlier prototype of AFS there were several performance issues such as high context switching overheads due to large number of processes, large number of STAT requests from clients (for every open()), high path traversal overheads, absence of load balancing on the server. Thus some of these problems lead to very high CPU utilization and limited the number of active users of the system (scalability).

Contribution:
AFS caches entire files. Contains a collection of file servers called Vice. Contains Venus, a user level process for every client which receives open() and close() requests. Venus requests entire files, performs reads and writes from local disk cache. Flushes only on close(), hence the writes are sent back to the server only on closing the file. RPC used for communication. Cache validation was performed using STAT calls to the server. After benchmarking the prototype and analyzing the performance issues, the authors suggested design changes that would enhance scalability and avoid the issues mentioned above. Callback mechanism introduced to improve the stale cache problem. Clients gets notified by server when the file is modified via the registered callback and client then invalidates the cache and subsequent request reads from the server. When file opened (open()) callback registered for the directories and file in the path and this is cached. In the case of multiple processes on the same machine, a write() to a file is made immediately available to the other processes. File identification now uses fid (volume number, inode number and unique id), thus avoids the problem of using whole path names and its traversal in the server. AFS uses the last writer win policy on the server. So multiple writes don’t get interleaved (as in the case of NFS which writes blocks atomically). Introduced the concept of volumes for load balancing and distributing files transparently across volumes. Volumes can be moved transparently. Multiple volumes are combined to form the AFS namespace in a tree structure. Light weight processes used at the server to reduce the context switching overhead.

Evaluation:
System setup used Sun2 servers and IBM-RT clients. The paper evaluated the earlier AFS implementation to prove that the problems mentioned in the above section were bottlenecks using a set of benchmarks, and then developed a new prototype using the design changes mentioned in the “contribution” section above to re-evaluate the implementation to find huge improvements. Some of the results of the evaluation of the new design are as follows: Higher number of potential users (50) since analysis shows lower performance of earlier prototype even with 20 clients. Server does not get saturated due to absence of STAT calls and introduction of callbacks. Lower CPU utilization which is around 15-25% for most servers, lower disk and network utilization as well. 70% slower performance of the earlier prototype. Comparison of AFS, NFS (both cold cache and warm cache) showed AFS handling high loads better than NFS. NFS performs better at lower loads with crossover mark at 3 for warm cache and 4 for cold cache NFS respectively. For a load of 18, NFS CPU utilization is pegged at 100% while that of AFS is only 38-42%. NFS has higher network utilization. Most of the design changes have been evaluated but the changes made to enhance operational ease have been theoretically explained but not evaluated. The overall improvement is understandable, but a split up of the costs including cloning of volumes would have been helpful.

Issues:
How do applications typically handle the last writer win issue in AFS. Are there any changes to design in future papers that have been suggested to circumvent this issue and use AFS even for concurrent write heavy (shared file) loads.

Posted by: Siddharth Suresh | April 7, 2016 08:20 AM

1. Summary
This paper describes the improvements made to the Andrew File System in order to increase its performance under scalability. It also compares the Andrew File System to NFS, a common distributed file system at the time.

2. Problem
In prior work, the authors had designed a prototype of a distributed file system called the Andrew File System, implemented it, and run it on about 100 workstations and 6 servers. They found, however, that this file system has serious performance problems. When they measured this using a synthetic benchmark, they found the poor performance was primarily due to 4 causes. One, checks of the validity of cached files were performed very frequently. Second, the architecture of one process per client on the server was inefficient. Third, the servers spent much time traversing pathnames. Fourth, the servers had uneven loads.

3. Contribution
This paper describes several improvements that help remedy the problems described above. The authors changed the semantics of caches: a client now assumes that its caches are valid unless it has been notified otherwise. While this can lead to inconsistency, they are convinced it is best.

The authors moved the name resolution from the server to the clients. Instead of sending pathnames across the network, the clients now send identifiers, each of which is called an Fid, which are calculated from the pathname by the client. These are analogous to inodes in file systems. Each Fid has three parts, identifying the volume the file is stored in and location of the file in the index of that volume. None of the information in an Fid refers to the exact server on which the file is stored.

The authors redesigned the server program to use multiple threads per process rather than multiple processes. Each client is bound to a particular thread only for the duration of a single server operation. They also modified the RPC mechanism to interact well with the many threads.

The paper also describes volumes, collections of files from a subtree of the file system namespace. These improve the mapping of files in the file system to server disk storage. These volumes can be redistributed from one server to another using a copy-on-write clone operation. They can also be replicated read-only across multiple servers. The allows for better balancing of files across servers.

4. Evaluation
The authors evaluate this new version of the Andrew File System on two different test sets. One is the synthetic benchmark they used to find problems in the original prototype. For the former, they give the CPU and disk utilization during the benchmark, as well as compare the time taken by the benchmark with several different loads. They find that all metrics are increased: the file system scales approximately eight times as well and the CPU and disk usage is better.

The paper says nothing to comment that the results for this benchmark may be overfit. They designed this new version of the file system to overcome the problems in the prototype revealed through this benchmark. It is quite possible that, thus, the new file system may not be as good on other benchmarks. Also, they make no claims that this benchmark models any real process.

The authors also present averages from actual usage of the Andrew File System, but give no comparisons. They evaluate CPU and disk utilization and find that the CPU utilization has high variability, and can go as high as 37.6 percent. They also present the distribution of file-system calls over a three-day period, and find that despite caching, fetches dominate stores. The also estimate the network utilization and find it quite low.

The paper also compares the Andrew File System with another leading distributed file system, NFS, using the same benchmark used on the prototype. They find that AFS scales much better than NFS and gives comparable performance and much better defined consistency.

5. Confusion
Could we discuss the tradeoffs of this paper's definition of consistency?

Posted by: Stephen N. Lee | April 7, 2016 08:13 AM

1. Summary AFS is a distributed file system intended to scale to thousands of nodes which remote access to files. Via aggressive client-side caching, use of server-side LWPs, unique identifiers which map directly to inodes, and a sever callback mechanism, the second iteration of AFS is able to reduce server load and network traffic, allowing its performace to scale better than NFS and the original AFS prototype.

Posted by: Anonymous | April 7, 2016 08:09 AM

Posted by: Michael Vaughb | April 7, 2016 08:08 AM

Summary
This paper identifies the problems with the initial version of the Andrew File System and discusses solutions to these problems to increase performance and scalability within the newer version of Andrew File System.

Problem
The initial version of the Andrew File System was designed as a distributed file system based on the idea of whole-file caching that could support as many clients as possible. However, the authors realized that the initial version did not scale up well as desired. They found a couple of reasons for this- (1) because of frequent stats to check local file validity, the clients generated a large amount of unnecessary traffic, (2) the path traversal costs to locate files in the file system hierarchy were too high, and (3) the server scalability was limited, as a single distinct process was being used to service each client leading to high context switch overheads. The authors aimed to solve these problems in the newer version of AFS by proposing changes in the areas of cache validation, name translation, server process structure and server load balancing.

Contributions

According to me, the following are the novel contributions of this paper:
(1) Introduction of callbacks to reduce the number of cache validation requests received by the servers. By maintaining state at the server, the server could notify the clients of any file changes, instead of clients polling the server.
(2) Use of multiple threads within a single process at the server to service all clients, instead of using multiple heavyweight processes. This not only reduced the context switch times between different clients, but also provided scalability as now each thread was bound to a particular client only for the duration of the single server operation.
(3) Introduction of a notion of a file identifier (FID) that can be used by the client to specify the requested file. Thus, by caching pathnames at the client, files could be efficiently retrieved at the server.
(4) Addition of volumes as a new data structuring primitive for operational transparency that improved operability of the system. This led to proper load balancing, disk-usage based quota implementation and simplified backup procedures.
(5) Among other features that were added to the newer version of AFS were, security mechanisms for user authentication, flexible user-managed access control and simpler file system management admin tools.

Evaluation
The authors have evaluated and compared the performance of their new version of AFS with the initial version of AFS, as well as, with the Sun Network File System (NFS). They have used a synthetic benchmark for both the cases that measures the performance for various file system operations like directory creation, copying files, scanning directory tree, reading and compiling files. From their experiments, they demonstrated that the proposed design changes clearly improved scalability of the AFS from the previous version. The initial version took more than 4 times as long at a load of 10 as at a load of 1, whereas the new version took less than twice as long at a load of 20 as a load of 1.
In their comparisons with the Sun NFS, they found that although NFS performed slightly better at low loads, NFS exhibited poorer performance than AFS at increasing loads. The lack of a disk cache and more server validation requests in NFS led to limited scalability as compared to AFS, which benefited from caching and the proposed callback mechanism. Overall, from their evaluation on the synthetic benchmark, they concluded that AFS’s scaling characteristics were superior to those of NFS.

Overall, I liked the evaluation presented for the synthetic benchmark, as the evaluation presented detailed graphs and analysis for various system parameters. However, I felt that authors just restricted their evaluation to one synthetic workload and could have included results from evaluation done on few other workloads too, similar to FileBench, Fstress, etc. Also, I think the paper did not talk much about crash recovery in AFS, which is certainly more complicated than NFS because the servers in AFS maintain state for callbacks to clients. It would have been interesting to see how good/bad AFS performs against NFS in cases of failures and crashes.

Confusion
Why did AFS not have much commercial success as compared to NFS?

Posted by: Saket Saurabh | April 7, 2016 08:02 AM

1. Summary
The paper examines the design of the Andrew File System, a large-scale distributed file system at CMU which is expected to support thousands of clients.
2. Problem
With the implementation of a prototype of AFS, performance and scalability problems are identified in their benchmark. They recognized four aspects that can significantly improve the performance of the system: reduce the frequency of cache validity checks, reduce the number of server processes serving clients, pass pathname traversal workload to client, and balance server usage.
3. Contributions
The fundamental design of AFS includes a set of trusted servers called Vice presenting a homogeneous, location-transparent file name space to clients, both of which run on 4.2 BSD of the UNIX operating system. Interaction between server and client is achieved by a user-level process installed on client called Venus, which caches entire files from Vice, and communicates with Vice with only file operations including open and close. In this paper, several changes are made to the system to deal with the performance problems they identified. In addition to caching files, the client should now cache a copy of the directories and symbolic links, whose modifications are directly reflected on server to maintain integrity. The cache validation scheme is changed such that instead of having Venus querying validity of cache with server on each file open, Venus may assume that the cache is valid unless receiving asynchronous notification from the server, which reduces the frequency such check is performed. A new file locating method is introduced using the notion of two-level names. Instead of having to do the CPU-heavy operations of mapping pathname to inode, servers are now only recognizing the unique fid, which calculated from Venus given the pathname. To solve the problem of excessive server processes serving clients, and the overhead of context switching in such scenario, non-preemptive Lightweight Processes are used to share a kernel process, avoiding the overhead of context switching.
4. Evaluation
The evaluation of performance of the new Andrew File System is done on five basic file-related operations: MakeDir, Copy, ScanDir, ReadAll, Make. The result shows that the original Prototype was 70 percent slower than the new design. The benchmark is done on various load units, observed eight hours on weekdays and measured resource consumption in many aspects, including CPU workload, and disk utilization. The system is also compared against Sun System’s NFS, which shows AFS’ scalability benefit due to lower remote call overhead when files are caches on client system.
5. Confusion

Posted by: Fujie Zhan | April 7, 2016 05:21 AM

1. Summary
The authors discuss incremental development on a distributed file system. The authors develop a prototype, study its performance characteristics and redesign the architecture to alleviate its performance bottlenecks and then benchmark the final products against pre existing products in the market.
2. Problem
The paper addresses two primary concerns. The first was performance caused by the significant number of client server calls made to validate cache consistency and the creation of a new process on the server for each client it spoke to. The second was operability, the server maintained location information implicitly in ghost directory, this made administrative tasks such as moving files between servers for load balancing purposes tedious and difficult to do right. Additionally, dividing files between large partition spanning multiple users made it hard to enforce quotas on each user and maintain quickly recoverable backups.
3. Contribution
The paper introduces various novel techniques to tackle their own issues but which are easily be applied to other use cases. They developed Light Weight Processes to get around an OS restriction to not share memory, these are similar to today’s threads. AFS also fixed the number of LWNs to reduce the cost of constantly context switching. The paper introduces Fid as an equivalent to an inode so that files could be looked up without expensive directory lookups. One major development were callbacks that ensured that in the steady state the server would inform the client about change in any of the files it was using, this eliminated the large number of network traffic just to revalidate cache coherence. The paper also addressed operability by introducing volumes and tying them to each user rather than mapping many users to one partition. This allowed easy enforcement of quotas as well as easy migration and read only replication using Copy On Write mechanisms. Volumes have since been widely used as a storage abstraction below many file systems. The paper overall introduces an easily administrable distributed file system for end users on various terminals rather than a distributed database type application
4. Evaluation
The paper thoroughly evaluates each design choice and its impact on scalability. The paper uses the same benchmarks on the prototype and the final product as a uniform test bench. The authors evaluate the time taken to complete the benchmark as a function of number of active clients. They also evaluate the CPU and disk performance to identify any bottlenecks. The paper also breaks down the percentage of each command being fired to identify any patterns, this leads them to discover that the RemoveCB command is often repeated due to workload characteristics and AFS changed that command to remove the callback for a batch of files rather than remove it individually. The paper also checks average disk and CPU utilization during regular use and checks for any signs of performance degradation and tries to draw conclusions explaining the difference in utilization of various AFS servers. The authors also see which data is accessed more often to replicate it on more servers in a read only fashion. After the above observations have been used to tune the system, the authors compare it with pre existing NFS to see the comparative performance. From the numbers the authors are able to conclude that AFS scales better than NFS and performs comparably even in cases when the workload is tailored for NFS. Overall the evaluation is thorough and does not leave any question unanswered. This is because the authors portray the strengths of their products but still confirm its shortcoming giving the reader a very clear idea of where this file system would be an ideal fit. For example they point out that this would not be an ideal file system where serialization of writes needs to be implemented by the FS such as query processing engine.
5. Confusion
I would like to know the changes that occurred after the paper was published as the authors admitted that they may need to re evaluate their choices when scaling up.

Posted by: Abhinav Mehra | April 7, 2016 05:20 AM

1. Summary
This paper discusses the Andrew File System, a scalable distributed file system. AFS was developed as an more scalable and efficient update of the ITC file system, based on profiling and analysis of the weaknesses, which resulted in design changes. While AFS also included changes for security and maintaining semantics, they are not discussed here. The primary focus of this paper is analysis of performance and designing for scalability.

2. Problem
In ITC / AFS v1, The Servers were not load balanced. On the important servers, extremely high CPU utilization was observed (average ~40%, while disk usage was Context switches b/w processes, Stat calls for checking cache validity, Traversing full paths (internally used Namei operations). Another weakness was that the server Could not cache critical shared info , as 4.2BSD does not allow processes to share virtual memory, and each client connection has a separate process on the server.

3. Contributions
Performance improvement by Reducing cache validity checks, converting server processes to threads, optimizing / reducing path traversals, Venus does pathname traversals, and balancing server usage.
Callback
a. Venus - cache validation changed from pull to push (notification from server)
b. Servers can break callback under memory pressure
Caching - one for data, one for file status.
a. Now cache dir contents and symlinks too
b. Status cache - in memory, for servicing stat syscalls
Name Resolution
a. Each file/dir now has a fid
b. Venus maps vice pathnames to fids
c. Fids - 3*32 bits (vol num - ids volume, vnode num - ids file in a volume, Uniquifier - allows reuse of vnode, keeping fid unique)
d. Venus (client) has a mapping cache (vol num -> server)
e. Vnode num -> identifies inode on server
f. Vice (server) Data access - lookup fid, iopen to r/w the data, no namei
g. Venus also does this, using a local dir as cache
h. Venus checks for callback on each component of pathname
Threading in Vice and Venus
User-level non-pre-emptive Lightweight Processes.
LWP bound to a client only for duration of a single server operation
RPC moved outside the kernel, integrated with the LWP
Volume
a. Can be moved - atomically - for space and load balancing
b. Multiple volumes to a disk partition
c. Quotas - volume has quota, user assigned volume
d. Unit of read-only replication and balancing (Helps in system administration)
e. Unit of backup and restoration
f. Provides operational transparency
Consistency Semantics
a. Weak consistency.
b. Callbacks for invalidation.
c. Writes in a workstation visible within that workstation immediately.
d. On close, writes visible on new opens. Existing opens not affected
e. File ops other than data changes visible immediately (run on server)
f. Per-file concurrency has to be handled by applications.
g. Fault Tolerance - callbacks re-established after failures / recovery

4. Evaluation
Multiple iterations of Design and measurement - for eg., Venus modified to remove callback on groups of files, instead of individual files, considerably reduced RemoveCB frequency.
Much better in performance and scalability than ITC / AFS v1.
NFS Comparison :
a. Industry leader, but not designed for large-level scaling.
b. NFS, AFS (Warm cache), AFS (Cold cache)
c. Reliability - NFS prematurely failed some workloads, due to FS errors (unreliable RPC, non-idempotent ops), which biased numbers in favor of NFS.
d. Benchmark Time - for small loads, NFS better, but AFS overtakes and scales much better.
e. Server CPU - NFS utilization is 2x-4x more
f. Server disk - NFS disk 1 (system files) and AFS are comparable
g. NFS disk 2 (user files) is 2x - 4x worse and scales worse too.
h. N/w traffic - NFS creates 3x packets for a load of 1.
i. Latency - Andrew warm is comparable to NFS, but cold is much worse, as entire file is transferred.
j. Note: Vice and Venus are user level, potential for improvement in moving to kernel.
Drawbacks :
a. Files larger than local disk not accessible at all.
b. Concurrency - File Locking - Record-level updates (databases) not possible
c. First file access - complex and expensive. Locality of accesses makes this viable
d. CPU utilization is still the bottleneck, it just scales much better
e. Authentication and network databases will become larger with scaling.
My Opinion :
I think this is the best measured among the papers seen so far. Design changes were motivated by in-depth profiling and analysis.
Multiple rounds of measurement and re-design seem to have been done.
That said, would have been nice to see a comparison with Cedar.
Also, some of the details on data structures, read-only replicas, path lookup avoidance etc seem vague to me.

5. Confusion
a. Workstations times are synchronized with servers using getTime() ? Isnt that a no-go in Distributed Systems?
b. File tree structure ?
c. Cloning of a volume is efficient / fast ?
d. What is the critical shared data that the server threads now cache?
e. How are read-only replicas discovered / stored in the file tree?
f. If pathname lookups are avoided, is path segment wise call-back existence checking still done?
g. Disk 1 NFS server disk utilization

Posted by: Adithya Bhat | April 7, 2016 04:43 AM

Summary:
This paper talks about the basic design, the bottlenecks in the design and then the enhancements made to the Andrew File System for large scale use. It makes design changes to improve the performance and scalability issues and then runs real world workloads to test the system. The design is then compared to remote open file system and NFS to prove the effectiveness.

Problem:
The performance analysis of the AFS prototype showed that distribution is dramatically skewed and that 2 types of calls constitute 90% of server traffic. This was because the clients had to constantly validate cache entries even if the files were cached in the client. This showed a scope for improvement cache management. Secondly, the excessive number of client processes were causing lot of context switches and also the file pathname lookup was making CPU utilization a performance bottleneck. The shortcomings drove the design of the revised version of AFS.

Contributions:
To improve upon the cache management strategy used previously Venus (clients) now do not access the servers to check validity unless notified by the server itself. A "callback" mechanism is used where server promises to notify to the client about any modifications to the file by and other workstation.
To improve upon the look up time of path names at the server site a notion of two level names was introduced. Each Vice file is now identified by a unique fixed length Fid. Venus now performs the mapping of path names to fid and presents director with an fid. The directly looks up an indexed table using this fid (volume + vid + uniquifier) to find the file storage information hence getting away with fullname lookup. An important optimization was that this fid contains no explicit location information hence moving files from one server to another need not invalidate contents of directories cached on workstations. There is a Volume Location Database replicated on each server which has location information for fids.
The cost of costly context switches were reduced by bringing in concept of userlevel LWP within one process.

Evaluation:
The authors have done very extensive evaluation of the system in my opinion. The input to the benchmark is a read-only source subtree consisting of about 70 files. There are 5 distinct phases in benchmark Makedir, Copy, ScanDIr, ReadALL, Make. On a Sun2 workstation with a local disk this benchmark takes 1000seconds. On initial AFS system it takes 70% longer and the improved AFS is just 19% slower than stand-alone workstation. The scalability of the system is evaluated by measuring the performance with increasing load and the revised system is seen to be integer times better in performance. The CPU utilization issue seen previously is also re-evaluated with new system with changed naming scheme and compared with disk-utilization. The CPU utilization is still a bottleneck but less than before. Moreover the design is also compared with the NFS file system to show the improved scalability and CPU utilization in AFS with a much less network traffic flow. However, they do not talk about network failures, client side failures and evaluate such scenarios.

Confusion:
What happens when server/client failure occurs? How is the information recovered? what is guaranteed?

Positives and negatives of notion of volumes?

Posted by: Vishakha Dhelia | April 7, 2016 04:12 AM

Summary
The paper talks about a homogeneous,location-transparent distributed file system called the Andrew File System. Here the authors present a series of changes in the areas of ache validation,server process structure,name translation and low-level storage representation and quantitatively demonstrate AFS' ability to scale gracefully.
The problem
The paper authors identified four issues plaguing the performance of the initial prototype of AFS – Firstly path traversal costs were too high. Secondly client issues too many TestAuth protocol messages to check the validity of the cached files. Another problem was that load was not balanced across servers. Finally the server employed a single distinct process per client thus introducing context switch and other overheads. Thus the authors aimed at addressing the aforementioned problems and introducing some more performance improvements with the main goal of supporting scalability.
Contribution
1.AFS showed how a distributed file system can be built differently from NFS. It follows the whole file caching protocol on the local disk of the client machine which saves a lot of network transfer.
2.AFS uses “callback” to reduce the number of client/server interactions.
3.AFS also introduced the notion of a file identifier (FID) (similar to the NFS file handle ) instead of pathnames to specify which file a client was interested in. An FID in AFS consists of a volume identifier, a file identifier, and a “uniquifier” (to enable reuse of the volume and file IDs
when a file is deleted). Thus, instead of sending whole pathnames to the server and letting the server walk the pathname to find the desired file, the client would walk the pathname, one piece at a time, caching the results and thus hopefully reducing the load on the server.
4.Because of callback and whole-file caching AFS makes cache consistency easier to understand and describe. Between different machines, AFS makes updates visible at the server and invalidates cached copies at the exact same time, which is when the updated file is closed. The server then breaks callbacks for any clients with cached copies, thus ensuring that clients will no longer read s
tale copies of the file; subsequent opens on those clients will require a re-fetch of the new version of the file from the server. In the case of processes on the same machine, writes to a file are immediately visible to other local processes . The other cross machine case where processes on different machines are modifying a file at the same time, AFS naturally employs what is known as a
last writer wins approach.
5.The use of volumes as a data structuring mechanism is a novel concept. It provides a level of operational transparency and solves the problem of load imbalance. Aggregation of files in storage volumes help and usage of location database servers help in moving user files seamlessly from one server to another.
Evaluation
The paper presents a comprehensive performance evaluation that shows the improved scalability of AFS. Firstly to identify the shortcomings of the initial prototype implementation ,the designers spend a great deal of time measuring its performance. The benchmark used is a command script that operated on a collection of files consisting of 5 distinct phases. Via a series of experiments on this benchmark a set of problems were identified (as discussed in “The problem” section ). This is a perfect example of how measurement and experimental evidence of a problem can lead to building a new and better system. Next the effects of the changes introduced in the new protocol were studied. Indeed the new version was measured and found to be much more scalable that the original version; each server could support about 50 clients (instead of just 20). A further benefit was that client-side performance often came quite close to local performance, because in the common case, all file accesses were local; file reads usually went to the local disk cache (and potentially, local memory). AFS performance was also compared with a remote-open file system NFS. NFS ran into a series of functional problems at high load. It was observed that NFS performed slightly better than AFS at low workloads but its performance degraded rapidly with increasing load. Also NFS generates more network traffic than AFS. What I really liked about the evaluation is that all the observations are very well supported with explanations. However the designers of AFS make some assumptions for the workload like most files are not shared and read sequentially in its entirety. These are not true for all kinds of workload like random updates in a transaction database. I would like to see the performance of AFS with such workloads. Also a comparison with Cedar File system which employs caching of entire files would have been interesting.
Confusions
What are idempotent and non-idempotent file system calls?

Posted by: Amrita Roy Chowdhury | April 7, 2016 03:56 AM

Summary
This paper analyzes the early AFS implementation and suggests improvements in various sections to improve the scalability. Several benchmarks were used in analyzing the drawbacks of earlier AFS and the same were used to show how the newer suggested implementation had improved scalability.

Problem
There were several problems when number of users scaled up using AFS. Since, each client was given a separate server process, number of server processes increased leading to more context switches. The network clogged due to the number of cache validity checks from clients on each open calls for a file. Servers requiring to do pathname traversals for files increased the CPU utilization and a single server also had the risk of getting overloaded with a huge number of requests. Such factors decreased a server's performance as the number of clients increased and the authors aim at optimizing these.

Contributions
a. Callbacks were introduced which were basically a notification from server to a client notifying of a modification to a file that the client has cached earlier. Thus, now the client would not go back to the server for every open call on a file that it has already cached. It keeps continuing with the cached file till it gets a callback. On a local modification, the client would return to the server with the changes. This reduced the cache validity checks to some extents.
b. Instead of dedicated processes, lightweight processes were used on the server side to process clients' requests. The context switches for such LWPs were light enough and hence increased the throughput for a server process.
c. Pathname resolution was removed from the server side and an fixed length id, Fid, was used instead. The server would only receive an Fid for a file/directory and it directly mapped it to the corresponding inode. This contributed to a subsequent reduction in CPU overhead on the server side. The clients keep a mapping between the fids and their corresponding pathnames.
d. Groups of files were grouped into volumes and these volumes were flexible enough and could be moved to different servers as a part of load balancing. Volumes also featured as back-ups and quotas that were created by replicating read only information for faster accesses.

Evaluation
The evaluation process seemed interesting in this paper. The earlier AFS implementation were first evaluated with a set of benchmarks and the authors, by studying these evaluations, figured the areas that need to be improved and came up with a new prototype with additions. And then, with the same benchmarking tools, the authors evaluated the newer prototype and found a considerable improvement in server's performance. The new implmentation was found to be only 19% slower than a standalone workstation as opposed to 70% slower in case of the older AFS implementation. This improved scalability and servers were able to handle 50 users each. The CPU and disk utilizations were found to be reduced with the newer implementations. Although, there were some anomalies found that the authors attributed to some unexpected events like maintenance activities and the use of bulletin boards of some clients that caused frequent access and modification of directories. The authors also the new AFS with an existing NFS remote file system and found that AFS performed better than NFS as the number of clients scaled up. Overall, the evaluation was good and seem to fit in well considering the authors' concentration of improving scalability for this paper.

Confusion
Volumes were not clearly explained.
Also, the paper mentions of concurrent file operations being allowed from different workstations on a file and relied on the applications to synchronize amongst themselves. Wouldn't this create the same amount of network traffic that the paper aims at reducing ? Or is the paper only concerned with reducing the server's load ?

Posted by: Akshay Kanfade | April 7, 2016 03:32 AM

1. Summary
This paper describes the early AFS and the scalability issues faced that motivated the authors to make changes in the areas of cache validation, server process structure, name translation and low-level storage representation based on the observations of a prototype implementation. The authors demonstrate the improved scalability and performance.

2. Problem
The main problem with the existing AFS was that the system did not scale well due the unneeded client-server communication.

3. Contribution
The authors briefly describe AFS and the benchmarking operations carried out to analyse the scalability issues faced. There were two main problems: path-traversal costs were too high and the client was issuing too many TestAuth protocol messages. Based on these observations, they lay out design changes for the system. They introduce the notion of a callback to reduce the number of client/server interactions. Due to this, the client assumes that the file it has is valid unless told by the server. They also introduce the notion of a file identifier (FID) consisting of volume identifier, file identifier and a uniquifier, instead of pathnames. In order to avoid the overheads due to context switches, a single process with multiple threads is used to service all the clients of the server. In order to avoid cost of namei operations, both client and server access files by their indoors rather than pathnames. The notion of volumes provides a level of operational transparency. I really liked the approach the authors use by first identifying the major performance slacks through exhaustive experimentation and then coming up with solutions that address these problems.

4. Evaluation
The authors evaluate the system based on how effective the changes made were and the characteristics of the system in normal operation. The initial experiments that had highlighted the drawbacks were used. The server was a Sun2 and the clients were IBM-RTs. This involves benchmarks with 5 phases - MakeDir, Copy, ScanDir, ReadAll, Make. They observed that remote access performance penalty was reduced considerably. The scalability had improved considerably. They present an analysis of the CPU and disk utilisation and the distribution of Vice calls. Overall, the network calls have been reduced. The authors also present a comparison of AFS with NFS which demonstrates that AFS has a superior scaling characteristics. It also provides a well-defined consistency semantics as well as security and operability. Due to these desirables, AFS significantly influenced NFS versions and a variant of AFS called DFS was adopted by the Open Software Foundation.

5. Confusion
Are path names still translated to inodes? Then how does FID fit here?

Posted by: Nivetha Singara Vadivelu | April 7, 2016 03:31 AM

1. Summary
This paper talks about the features of Andrew File System (AFS). It is a location-transparent distributed file system (DFS) and it scales a lot better compared to other DFS. They created a prototype of AFS and then ran some benchmark experiments to figure out what changes had to be made to improve the performance of the system. They again ran their benchmarking experiments to see the effects of their changes. The evaluation part of the paper compares AFS with NFS and they show that AFS especially with caching scales a lot better compared to NFS.

2. Problem
Scalability is a huge problem in a distributed file system environment as it degrades performance and complicates administration. The first part of the paper evaluated (benchmarked) the performance of their prototype of AFS and some of their observations justified the previous statement. Some of the problems were that a large part of the workload did stat validation of caches, limited to only 20 active users and saturated the (CPU) resources quickly.

3. Contributions
Like mentioned earlier the authors implemented the prototype (with vice on the server and venus on the client which contacts vice and caches their data) and did benchmarking on their prototype. They were able to find many interesting observations from their experiments and made changes to their prototype based on these results. Some of their changes were using callbacks to avoid frequent get file-stat request, using Fid's and Volume location database to make FS location-transparent and reduce namei operations (and their overhead), reducing context switch overhead with lightweight process and using vnode instead of path to lookup files. They again evaluated their changes and also compared it with a remote file system like NFS after briefly describing workings of NFS. Finally they talk about the changes they had to make to improve the operability of their system. Their data structure primitive is called volumes. They are using cloning for the purpose of rebalancing the volumes across servers, assigning quotas to user, replicating (allow temporary delayed replicas) files which are frequently read and doing snapshots to backup data. The paper was benchmark/evaluation heavy but I feel the main contribution of the paper was using callbacks and making the server notify client as the workload was dominated by cache(file-stat) check and using vnode's (fid) instead of pathnames. AFS is still used today (like department's FS/ CSL), so we know it has high staying power.

4. Evaluation
The paper has run benchmarking experiments on the prototype, modified version of AFS and NFS. Their benchmarks (operations like makedir, copy, scandir, readall, make) were effective and their results were used to make design decisions/changes. An example of them using their results were that testAuth and GetFileStat dominated the workload & because of this they decided to implement their callback functionality. After making changes they evaluated modified AFS (CPU is a lesser bottleneck now). They did measurements on multiple servers and the results were either as expected or they had a valid explanation for it (dominating store is because of poor locality behavior of bulletin boards). They also compared it with NFS and showed that AFS ( with hot cache ) scaled much better and had performance comparable with NFS(latency wise) for low scale DFS. I feel they wanted to tell us that AFS scales better & does not provide a huge overhead, and I feel they were able to put that point across to the reader
. But they have not spoken about/evaluated the overheads caused by the changes they made for improving operability of their system (to either warranty these changes or specify their impact).

5. Confusion
Why were they fine with having old copies for short duration in the read-only replicas? Is it rare to expect incremental cloning to happen for a long time because the original copy keeps on getting updated and you can't move to the new clone?

Posted by: Anubhavnidhi "Archie" Abhashkumar | April 7, 2016 03:00 AM

Summary
The paper discusses the design evolution of a distributed file system, called the Andrew File System (AFS), that helped it scale gracefully by achieving better performance and operability. This redesign was motivated by the observations made during the evaluation of an earlier version AFS prototype which revealed several design and implementation weaknesses.

Problem
Problems with performance – The earlier AFS design created one process for each client on the file server. This resulted in difficulty to share critical shared information amongst server processes, high-context switching overheads and high contention for server memory. The client cache management sent a cache validation request to the server upon every file open command, causing most traffic to the server to be dominated by calls that did not transfer actual file data to the client. File access was also performed by using pathnames, which further depleted valuable processing time at the server. The inflexibility of moving files between servers made load and disk space balancing amongst servers difficult.

Problems with operability - The root cause of the operability issues in the AFS prototype was the inflexibility of mapping server (Vice) files to the server disk storage. Along with the restriction for files to be independently located only when they were present on separate disk partitions, this feature also resulted in the difficulty to provide 1.File backup, 2. Mechanisms for file replication and location, 3.Movement of files across servers and 4.Implementation of user quotas.

Contribution
Similarities with older AFS version – The key AFS design principle of “workstations caching entire files from dedicated autonomous servers” is left untouched. Like the AFS prototype, the new design also contained Vice, the collection of file servers which run user-level server processes and Venus, a user-level process at every workstation that runs when a file open / close request to a Vice file is intercepted by the client kernel. To exploit locality of file references, use network and server CPU resources efficiently and to simplify cache management and consistency, Venus requests entire files from a Vice server, reads/writes them locally from the disk cache and writes back the updated file content back to the server only upon a file close. Also, communication between clients and servers still used the RPC (Remote Procedural Call) mechanism. The in-memory status cache was also kept for quickly looking up the results of a stat call. Proximity to 4.2BSD file system semantics was also maintained.

Changes made for improving performance – The cache management now ensures consistency of the cache entries by implementing the callback mechanism, using which Venus is notified of a stale cache entry by the server. This is opposed to the cache validation request made on every file open by the earlier prototype. The callback is registered on the server containing the cached file when a client requests access to it. Also, Venus now caches both directory contents and symbolic links along with files. The file namespace is managed by a two-level naming system using the fid (file id) which consists of a file volume number, vnode number and a unique id. Information about file volumes is stored in a volume location database. The servers and Venus processes now use user-level Lightweight Processes to tackle the problem of high context-switching overhead and server memory pressure. The low-level storage representation, which performs file accesses through inodes obtained from a file's vnode information, speeds up the data access on the server. Venus also uses a similar mechanism to directly access cached files using their inodes.

Changes made for improving operability – The new volume abstraction represents a partial subtree of the AFS namespace, such that are multiple volumes are stitched together using Mount Points to form the entire namespace. Volumes can be resized and many volumes may reside in a partition. Movement of volumes across servers is made possible by creating the frozen copy-on-write snapshot of the volume in the clone operation. User quotas can now be implemented by assigning a quota to each user's independent volume. Read-only replicas of frequently read files can be created by using the clone operation on the required volume. This clone operation is also used to create a volume snapshot for use as a file backup.

Evaluation
The authors provide an extensive body of evaluation related work. They first run a synthetic benchmark (consisting of typical file accesses made by a user) on stand-alone systems to indicate local operation times. Statistics obtained by running the earlier version AFS prototype on the above benchmark and during its regular operation indicated the design and implementation problems mentioned in the problem section. These statistics included the distribution of Vice calls, prototype benchmark performance (and its scalability), and the CPU and disk utilization across the servers in the prototype.

The implementation of the revised AFS design was tested next. A scalability analysis for the above benchmark indicated that the new implementation scaled much better than the earlier prototype, while keeping its CPU and disk utilization reasonable in the presence of high load. General observations about the new AFS implementation for its regular course of operation were also measured and explained. These included measuring the CPU and disk utilization across all the individual Vice servers and the distribution of the calls made to the Andrew Vice servers. The type of volumes and the user count for the individual servers were also reported.

A comparison with an NFS system using the above mentioned benchmark indicated that while NFS performed better for small load units, AFS scaled much better and thus outperformed NFS for larger load units. The AFS implementations (Warm cache, cold cache) were also using significantly fewer resources (server CPU, disk) than their NFS counterpart, especially at large loads. NFS Network traffic was also about 3 times higher than AFS. Also, the AFS implementation with a pre-built client cache (Warm Cache) always outperformed its counterpart which started with a cleared cache (Cold cache).

While there is a detailed evaluation on the performance aspects, no statistics have been presented for the features that were claimed to have improved the operability, for e.g. time required for the clone operation. It would also be interesting to see how AFS performed against the Cedar file system, as both of them use whole file caching and hence such an analysis would throw more light on the implementation issues of whole file caching systems in general.

Question/confusion
1. How do AFS implementations guarantee consistency of the callback information in the event of a network failure or a crash at the client / server end?

Posted by: Shantanu Bhate | April 7, 2016 02:42 AM

1. Summary
This paper talks about the version 2 of the Andrew File System (AFS). The paper explores the bottlenecks in the previous version and explains the design changes made in order to improve the performance of this file system and make it more scalable. It provides a comparison between AFS and NFS and talks about the design changes made to make it more operable.

2. Problem
AFS was originally designed to be a scalable, distributed file system. However, the original design did not scale well. Cache Validation after every open and close call lead to increased server traffic. Pathname traversals took up a lot of CPU compute time. Having a dedicated process on the server for every client led to increased context switching overheads and virtual memory paging overheads. Servers were not equally utilized. All these factors contributed to extremely high CPU utilization which turned into a bottleneck for scalability. The initial version was also poorly designed from an operability point of view. As a result, they make significant changes to the design in order to improve scalability and manage operations.

3. Contribution
The authors start with prototyping the existing design in order to gauge the inefficiencies. The newer version retained the whole file caching design and improved upon it by caching directory related information also. This helped move the pathname traversal to the client machine, thus freeing up the server machine. Instead of presenting the server with a full pathname, they present it with an fid number. Each directory is identified by this fid and each entry in a directory maps a pathname to an fid. The venus process on the client caches this information for later use. It can use this information to provide the server with an fid number. The server is no longer aware of the pathname. In order to further reduce the cost of the namei operation, they wanted to access the file using the inode number directly. Since, inode number is not available in user level, they added an extra system call in order to get the inode number. They also introduced the notion of callback. The callback ensured that any state change would be communicated to the client by the server. This removed the server process call to validate cached data for every open call. These two design changes helped reduce server cpu utilization significantly. While recovering from a crash, the client would treat cache data as stale and validate it. Instead of having a dedicated process for each client, they introduced multiple lightweight processes (LWP) within one process. The LWP is bound to a particular process only for the duration of a single server operation. In order to accomplish a balance of disk and CPU utilization on servers, they allowed the movement of volumes. The volume-server database was updated accordingly. They provided a mechanism for quick copy on write snapshot which helped in the cloning process, backup and read only replication.

4. Evaluation
In order to evaluate the design, they run a benchmark with 5 phases - MakeDir, Copy, ScanDir, ReadAll, Make. They compare their filesystem to NFS. They also provide the statistics after running measuring the performance of the FileSystem in the real world over a prolonged period of time. The initial version led to poor scalability and skewed server utilization. After running the same set of benchmarks, they proved that problem was alleviated. The results showed that the system scaled upto 20 Load Units. Each load unit is equivalent to 5 andrew users. For a load of 20 units, utilization of CP was 80% CPU while that of disk was 23%. Statistics collected after deploying the new version of AFS in CMU showed that the distribution to the calls to AFS servers were no longer skewed and the server could handle about 128 active users in peak periods without overloading or crashing. The comparison with the NFS system was quite interesting. NFS did not scale beyond 18 load units as the CPU utilization was high. Infact, after 10 load units, NFS suffered from functionality breakdown. I feel the only issue with the paper was that they do not explicitly talk about about how they manage client side failures. Also, the disk space on the client being a limiting factor on the size of file that could be accessed is a drawback.

5. Questions
The notion of Volume and how it helped was not very clear.

Posted by: Urmish Thakker | April 7, 2016 02:33 AM

1. Summary
This paper addresses the scalability challenges and bottlenecks in the first prototype of the Andrew File System (AFS). The authors extend AFS with callbacks, efficient server side processing, a streamlined representation of storage, and introduce the idea of volumes for efficient FS management.
2. Problem
Deficiencies in AFSv1: Firstly the authors observe that applications with a large number of stat or file opens suffered a higher latency. Each stat or open call required the AFS client process to consult the server event though it may unlikely that another machine may modified this file during this interval (ex private user files). These validity check may lead to hot-spots on the AFS server. Secondly, the server side system spawned a dedicate process for each client, it also used a kernel provided RPC package. These lead to high context switch overheads and pressure over the kernel resource while under heavy load. Lastly, for portions managed by other servers an AFS server would store stub structures on its local file system. This made it difficult to relocate data to balance disk utilization or to manage file system quotas.
3. Contributions
The basic idea of AFS was to use whole-file caching and thus avoid intervention by the file server during reads/writes. Retaining this basic design the authors identify bottlenecks and improve over their first cut version of AFS.
Using a micro benchmark the authors identify two major bottlenecks in the AFS prototype. Most of the remote calls are cache validation or file-stats and their average response time increases with server load. Also, the server side cpu utilization is relatively high and saturates on high loads becoming a major bottleneck. AFS has no notion of inodes, thus a server process must traverse the entire path on its local file system for AFS file requests. This along with high context switch overheads are the likely cause for high cpu utilization.
The key idea to reduce cache validation checks is a callback mechanism. The AFS server would send a callback to all clients caching a file on an update to that file. This obviates cache validations but adds the overhead of maintaining sharing state in the server. To do away with costly file path traversals the authors introduce notion of file identifiers (on the lines of an inode). This gives a unique ID for each file and directory. The path traversal is done at the client and can be cached there, all requests use fids. To further avoid path traversals at the sever the authors add a set of system calls that allow files to be accessed using internal inode numbers. The fid is mapped directly to the inode storing that file. Lastly to improve the server side performance the authors use a form of user level threads and custom RPC protocol optimized for bulk transfers. A thread is bound only to a single server operation and not a client.
In order to address problems with managing the files system the authors introduce the notion of volumes. A volume logically represents a sub tree of the AFS namespace. The volume-id is part of the file identifier; a table maintained at all severs gives the current mapping of a volume to a server. This allows for migration of volumes across servers and easy replication and backups. The system also uses a form of copy on write so the original server can continue serving most request while the volume is being moved or a snapshot is being taken for backup.
4. Evaluation
The authors use an interesting micro-benchmark which I think was intuitively representative of a typical user’s activity. By varying number of clients they simulate loading on the server. This allows for measuring the original bottlenecks by probing request response times and cpu utilization at the server. They also compare their system against a state-of-art NFS server. The new version of AFS is clearly scalable due to vastly reduced remote call overheads and better CPU utilization. The NFS tends to saturate the server beyond modest load size. However I think it was an unfair comparison given NFS does not cache files in the disk but only in memory. But it does support the argument for whole file caching. Results are also presented for the live systems at CMU. While they are interesting it looked like they were superfluous. It would have been great if they added some data on volume transfer time and snapshot time.
5. Confusion
How does AFS deal with server crashes because it looks like it would lose state for sending call backs making some cached files stale.

Posted by: Brian Coutinho | April 7, 2016 02:01 AM

1. Summary
Through this paper, the authors examine the various features and design decisions of the Andrew File System (AFS) that have impact the scalability of the system at hand. The authors implement a prototype of AFS and benchmark it to identify the inherent problems in AFS that limit scalability. Next the authors propose a number of changes and evaluate them to show their effectiveness. Lastly, the authors also compare AFS with NFS, which is a remote-open file system.
2. Problem
The main design goal of the AFS was to be scalable. However, at large scale, AFS was not easy to operate and it performance was sub-optimal. As a part of their proposed solution, the authors wanted to tackles the aforementioned problems. Through the benchmarking of their prototype, the authors came to the conclusion that the server CPU was the bottleneck due to numerous factors such as pathname traversals, frequency of cache validity checks, use of processes to handle requests (lead to extra cost due to context switches). Another problem identified was that the load among various servers wasn’t balanced. The authors propose changes to AFS to mitigate these issues.
3. Contribution
The authors have introduced a number of changes to the original AFS design. The motivation behind these changes is the results obtained by benchmarking their prototype. Firstly, they propose caching the contents of the directories (which enables the feasible pathname translation and introduction of the concept of callbacks to reduce the cache validation traffic. I really liked the callbacks concept as the results proved it to be effective even though it seems something quite obvious. Secondly, the authors introduced the concept of FID (consists of volume number, vnode number and uniquifier). This feature eliminated the implicit namei operation as one could get access to the inode by indexing into the array using the vnode. The FID concept combined with the use of the replicated location database made moving of files easier (eliminated the location stub which was earlier part of the file subtree). Lastly, the authors introduce the idea was LWPs (similar to threads) to handle client requests instead of individual processes. This further reduces the CPU load on the server. Inorder to make AFS more operable the authors introduced the notion of volumes. Through volumes, server load was balanced as there were mechanisms in place to move volumes and replicate high frequency read heavy files. Additionally, disk quotas were also supported through the concept of volumes.
4. Evaluation
I think the authors have done an excellent job of identifying the problem and it’s causes. This part of paper was the most interesting for me. Through the prototype, the authors identify that the CPU was the bottleneck (also identify the calls which cause this) and also identify server load imbalance. This evaluation was the basis of all their proposed changes. The authors go on to compare the performance of their solution with the initial prototype. The results show that the new AFS is 19% slower than a stand-alone workstation (as opposed to 70% of the AFS prototype). The authors evaluate the system for its scalability as well as its CPU and disk utilization. The results show that the scalability has improved and the goal of 50 users per server is met. The authors also evaluate AFS with NFS and show that it scales better than NFS. However, I am not quite convinced with the need for this comparison. The goal of the authors was to improve the existing AFS. Thus, a comparison with the existing AFS is called for. On the other hand, comparing with NFS, which the authors clearly claim not to be scalable, is something that I don’t quite understand. Experiments show NFS to suffer from lower latency as compared to AFS.
5. Confusion
The authors seem to be fine with the stale state of read-only replicated volumes. Why is this the case? Isn’t this something that could be an issue? Also, wouldn’t it make sense to have a centralized location store?

Posted by: Arjun Singhvi | April 7, 2016 01:51 AM

Summary
The Paper presents a Distributed File System which is scalable for large networks . The File system is made scalable by reducing the network traffic by using client cache to store the file/directory contents and using callbacks. Other techniques used are simplifying the name resolution and using volumes to keep track of files in database .
Problem
The existing Distributed file system present at those days was not scable for large networks. The paper presents the details of development of a prototype of distributed file system and how it was optimized to a final design which is scalable and outperforms NFS.
Contributions
The main contribution of the paper is a Scalable Distributed file system(DFS) by using the following features
cache management: Reduction in the traffic in the network by avoiding the calls to server for every file/directory operation by caching the entire file contents/directory structure at client once it has been fetched . Calls to server are only made when the requested file or directory is not present in the cache or the callback has expired. Server now notifies the client when there is a change at the server for the cached content of file/directory .
Communication and Server process structure: Handling of client requests at server is handled by LightWeight Processes run at server. A LWP is bound to a particular client only for the duration of the single server operation. Communication of client and server is through RPC which is optimized for bulk data transfer.
Aggregation of files into volumes enhances the operability of the system. volume location mapping is stored in every server replicated. The mapping is also cached at client for the files that are already opened prior. hence client contacts server for volume info only when not present in cache .
Name Resolution : The name resolution overhead is now removed by using the Fid . Server is now presented with FID instead of file name which contains information of volume number, Vnode number , and a identifier. Server has now a new set of system calls, the vnode information in fid helps in directly accessing the file and hence the data access at the server is fast. Volume location database is replicated on every server, which helps in locating the requested volume.
Evaluation
The paper has presented a excellent evaluation by starting from the evaluation of the prototype to the final design by showing how the changes has improved the system. A detailed breakdown of the benchmark run of the design shows the file system calls time, CPU,disk utilization with tables and graphs showing improvement achieved.
They have compared AFS with best Distributed File system available at that time NFS. They have separated AFS as cold and warm cache, which helps clearly understand how AFS with warm cache can outperform NFS. The table XI shows how some too he system calls such as mkdir, ReadAll outperforms NFS by caching. The figures 3,4 shows how AFS cold and warm cache performs well on benchmarks and CPU utilization compared to NFS. A full detailed table of CPU utilization and disk utilization is given in Table XII and explaining how some of the servers had higher utilization based on the contents they stored. The table XIII summarizes how the network traffic is reduced with AFS. Overall AFS outperforms NFS when the nodes in the system are large.
Missing evaluation: How consistent is the cache copy of client with respect to server. Suppose the server send a message to client about callback is broken and packet is lost in network, this situation is not evaluated. When file size is larger than the cache size at client, how does the system work in those situation? Performance of system when there are large number of files ,I think server will have difficulty to handle large number of callbacks. The situation if one of the server goes down, how does the clients are informed of callback being broken, how does the other server get to know about this.
Confusion
What is the current industry implementation of AFS. If multiple client push their updated copy to server, does the file have mixed content from both the clients or is there any locking mechanism implemented? And how is backup of volumes handled?

Posted by: Mushahid Alam | April 7, 2016 01:45 AM

1. Summary
This paper manifest the implementation efforts of AFS prototype along with qualitative and quantitative experience which motivates in improving performance/scaling by effective cache management, name resolution and communication techniques and optimizing server process structure and storage representation, manageability by introducing a new abstraction of volume. The final system was compare to Sun's NFS system, and was shown to scale considerably better than NFS.
2. Problem
Remote file access is slower than local though better than timesharing system, much of which is attributed to stat system call since it needs to be resolved by server even for cached content. Server operation and administration is difficult as process-per-client creates excessive context switching and paging. Fixed location database makes migrating files difficult and server load is unbalanced due to lack of quotas, another reason for high CPU utilization.
3. Contributions
Cache validity checks were reduced with advent of callback for modifications. Status cache is kept in VM for rapid servicing of stat system calls. Server process count was effectively lowered with the usage of LWP whose context switching is only order of few procedural calls. Servers imposed less load by moving pathname traversals to workstations and optimizing by accessing directly with inodes. Load balancing was delivered by reassigning users. Introduction of new level Fid ensured that expensive namei operations are eliminated from servers and this also made files server agnostic for cache invalidation. Due to implementation outside kernel, it was capable of supporting many clients per server. It doesn’t contain functional problems like NFS file system errors due to unreliable datagrams. It became easy for a small operational staff to run and monitor with minimal inconvenience to users due to inflexible, offline mapping of Vice files to server disk storage. This was gained by introducing non-server-specific volumes which does not need to synchronize between servers when volume is moved atomically. This also led to read-only replication of files to improve availability and load-balancing which lead to its usage in orderly release process of system software.
4. Evaluation
Close/open consistency of AFS prototype works well enough for emulating 4.2BSD file system semantics though strict emulation of 4.2BSD is not provided. But current OS allows sharing of virtual address spaces, there is a need to re-evaluate gains of communication between server processes. Due to limited state information kept by AFS server, Venus generates cache validation request for all cached contents after reboot but if callback state is excessive, performance may decrease. Another caveat is inconsistency if callback state is out of synched. Performance penalty for remote access is reduced since ScanDir and ReadAll phases are barely affected by load with callbacks. Bottleneck of server CPU still exists which authors attribute to the slower CPUs and ongoing maintenance activities even though scalability goal of 50 users per server is reached. Disk utilization is high particularly due to storage of bulletin boards, frequently accessed directories on server. Although diskless operations are possible, files larger than local disk cache cannot be accessed. High latency is another limitation of AFS which is exacerbated when file is not in cache. Even with atomic volume propagation, certain replication sites may contain stale data. Whereas backup mechanism is simple, efficient and seldom disrupts user.
5. Confusion
How does server build callbacks in the event of its crash?

Posted by: Unmesh Phalak | April 7, 2016 01:38 AM

1. Summary
AFS is a distributed file system that is location-transparent and scales gracefully. The protocol design of AFS minimizes server interactions through whole-file caching and callbacks, each server can support many clients and thus highly scalable. This work is carried out in stages, the final design build upon the caveats of the older and finally they compare against Sun Microsystem’s NFS.
2. Problem
The challenge in designing a distributed file system is for it to handle multiple clients at low resource consumption (network,CPU etc.), and at simpler management and usability. So, to maximize scalability server should be made lighter.
3. Contributions
Key design idea in AFS is that the clients cache entire file on the local disk, unlike NFS scheme of caching blocks in memory. Caching is important to achieve scalability- they maintain cache for data(disk) as well as status(memory). This saves network communication overheads, simplifies cache consistency principles. AFSv1 had a client/server protocol design, simple enough to fit into the distributed nature- sending full pathnames to identify the files. They then evaluated this system and derived the issues that it faced- polling overheads on both CPU and network load, difficult maintenance due to a dedicated process per client and path traversal costs. This helped them to formulate the next big idea of a scalable protocol that requires the servers to send callbacks to the clients if the file was modified, instead of the other way round(clients probing). They also introduce file identifiers to identify the files instead of a full pathname and it is independent of underlying location, first access generates many client-server messages but subsequent r/w operations are entirely local and require no server interaction at all. Thus, in the common case AFSv2 behaves nearly identically to a local disk-based file system. Cache consistency becomes simpler due to callbacks and whole file with a rule that on simultaneous writes, the last writer wins assuming a cooperative environment. They also introduce read-only replication of files at multiple servers, thus providing high availability.
Overall, AFS design is simple and scalable. However, it did not convince to be generic for all workloads, and then the event of a server crash might prove catastrophic and the concurrent writes rule seem crude. This work has a good flow of proposing a design, measuring it, identifying the shortcomings, then re-thinking the design and providing a strong final design.
4. Evaluation
The design versions are evaluated in great detail using scalability benchmarks to understand the degradation factors. Since the major design choice was caching whole files, they examine the hit ratio, they also experiment on the potential factors affecting performance like the client/server interactions, resource (CPU and disk) utilization and hence are able to rightly determine the reasons: context switching overheads due to many server processes- 1 per client, frequent cache checking, pathname traversals. With all the modifications to the original design, it is then re-evaluated for scalability and performance and compared with NFS. It has a significant improvement- it is much faster, callbacks reduce network traffic, reduced disk utilization but an increased server CPU utilization. They also exhaustively test with respect to the time of the day, workload locality behaviors as AFS was deployed at a real-world production scale over a year. Comparison against NFS was clearly on the basis of scalability and not resource cost. Results show that NFS fails at high loads due to file system errors => NFS doesn’t scale, but it performs better than AFS at low loads but only slightly => AFS doesn’t compromise on performance to increase scalability . AFS provides better resource utilization and generates 3x less network traffic. Wonder why NFS is still popular than AFS.
5. Comments/Confusion
A brief distinction chart on major distributed file systems would be helpful. How is fault tolerance in AFS handled now, does the single server have replicas? The consistency semantics for concurrent writes is simple but unfair.

Posted by: Tithy Sahu | April 7, 2016 01:35 AM

Summary:
The paper describes the shortcomings of an initial prototype of the Andrew File System (AFS) in tackling scalability and ease of operability, and proposes and implements modifications. The modifications primarily involve cutting down server requests from the client side in-order to boost performance and lowering the burden on the servers.

Problem:
The initial prototype of AFS did not scale well for even moderate server loads. Server CPU utilization during peak periods of file system activity went so high that it often saturated the server CPUs. Performance of a synthetic benchmark that copies a source directory tree to a build tree, scans the files in the source tree without reading any of them and then reads each file for compiling is significantly worse in the prototype even with minimal server load compared to a standalone system. The paper discusses methods to make AFS more amenable to scaling and operability.

Contributions:
The authors evaluate the shortcomings on the initial prototype and point out key bottlenecks that affect the performance. Two of these involved frequent server lookups for file status information and pathname resolution done on the server side. To combat the first problem, the authors propose to cache file status information in virtual memory of the client workstations after registering a callback with the server to notify the workstation of the status is no longer valid. This way stat system calls are instantly satisfied. For the second problem, the authors propose a mechanism where pathname resolution is done on the client side through fids. The mappings of directory entries of a cached subtree to the fids of the individual entries are maintained on the client side. The server receives these fids, maps them onto native inodes which are then looked up directly from the user space through system calls. Another improvement they propose is regarding the process structure on the server side. The server uses user-level threads instead of processes to avoid the context switching overheads and to enable sharing of common data structures. From the operability point of view, the authors propose a concept of a volume which is a collection of directories that are located on one server. Volumes are identified by volume id numbers that are part of the fid of every file. Location of volumes is identified with the volume location database. The presence of the database and the concept of volumes that is at a coarser granularity makes it easy to move volumes across servers through a Clone operation. Clone maintains copy-on-write semantics and the volume movement can be done on the fly. This also enables maintaining read-only replicas. A similar approach is taken for backing up volumes which was not a straightforward task in the initial prototype. Volumes also allow administrators to enforce quotas for disk usage.

Evaluation:
The authors evaluate the initial prototype with the benchmark and results from actual usage to point out the shortcomings. They have evaluated the improved AFS on similar lines and show the improvements in CPU Utilization and benchmark performance times. The authors have also evaluated the improved AFS with the industry standard NFS. The authors have noted that NFS is targeted at small number of workstations sharing the namespace, but being used for comparison here because of the popularity of the file system. Experiments show how the benchmark performance in NFS and AFS scale with the load on the server. Similar experiments are also done for CPU and disk utilizations for NFS and AFS. The authors have stuck to one synthetic benchmark and other real world usage scenarios. They could have shown the nature of such workloads for a much better understanding of the shortcomings. They could have also run other standard benchmarks such as database workloads to demonstrate the scalability of AFS.

Confusion:
How has AFS evolved over the three decades where the nature of workloads, time for network access and performance of CPUs have all changed tremendously?

Posted by: Prashanth Balasubramanian | April 7, 2016 01:33 AM

1. Summary
The paper talks about design decision that were incorporated while implementing Andrew File Systems(AFS) to make it highly scalable and simpler design for day-to-day maintenance. It evaluates the design against existing distributed file systems like NFS.

2. Problems
Distributed File systems poses 2 major problems when it comes to implementation - scalability and maintenance for day-to-day operations. The AFS is design to overcome these two problems. The authors develop a rudimentary prototype of AFS to understand the overheads involved in the design of AFS. The prototype design increased the server CPU utilization because of constant probe messages from venus (client side file system manager), namespace resolution and process context switching to service the client request. Also, the initial prototype did not scale well beyond server load of 5 clients due to the aforementioned overheads. The prototype lacks load balancing among the servers and does not implement quotas for the users, which the authors believe is critical for a distributed file systems.

3. Contribution
The major contributions of the paper aim at reducing the overheads identified from implementing the prototype.
1. Cache Management : To reduce the number of TestAuth calls to the server which increases both network traffic and server load, the paper proposes a callback mechanism in which the server will inform the client if there are any changes to the files that it has cached. Even though this method complicates server implementation, authors believe that it would scale well enough to suit their needs.
2. Namespace Management: Using the full file path requires the server to call namei() function to resolve to the corresponding inode number. This becomes a significant overheads when the client's probe for updating their cache status. Even though the callback mechanism reduces this traffic, using namespaces is inefficient. The paper proposes an alternative a unique 96 bit FID which is a combination of Volume ID, Vnode number and a 32 bit uniquifier. Vice also maintains Vnode to inode translations to avoid calling namei() function. These techniques significantly improves the performance.
3. Server Processes: Since the prototype implements the server functionalities as processes, CPU spends a lot of time in context switches and also 4.2BSD does not allow shared memory addresses between processes. The proposed solution uses user level threads whose thread switching cost is just slightly higher than procedure calls and also optimizes the RPC stack for performance. This considerably improves server’s performance over the prototype.
4. Operability : For ease of maintenance, the paper replaces directory structures with Volumes and uses nodeID to index into the volume. The volume ID makes it easier for relocating the files for distributing the load equally across the servers. The updated AFS also provides quota for each user.

4. Evaluation
The evaluation is presented in two phases. First it compares the updated AFS against the prototype that was initially proposed in the paper. The new version scales better than the prototype ( upto 20 LoadUnits) and reduces the average CPU utilization. It does not exactly show how much AFS can actually support (in terms of Load Units) before crashing. That would have given a clear picture on how well AFS actually scales.

Second phase of evaluation compares AFS against NFS. These two systems are evaluated for performance( Benchmark execution time), CPU utilization, Disk utilization and network traffic created by these software. In all the cases mentioned above, AFS performs better than NFS and authors state that the difference is due to the design of NFS. The evaluation considers two scenarios of AFS - AFS with warm cache and cold cache. AFS cold cache clear all cache contents before the trial. For both the phases, the programs look CPU bounded and fails to characterize AFS behaviour for I/O bound workloads.
5. Doubts
The results presented in the paper show that performance was CPU bounded. This would have changed over time and would have become I/O bounded. What changes were made to AFS to adapt to this change?
Usage of 32 bit uniquifier was explained clearly? What scenarios would lead to reuse of FIDs?

Posted by: Bharadwaj Krishnamurthy | April 7, 2016 01:28 AM

1.Summary:
This paper is about improving the scalability of Andrew File System(AFS) without degrading performance and operabilility of the system. Various features such as callbacks, efficient name resolution, server process structure and low-level storage representation are added to AFS and is evaluated against Network File System(NFS) for its scalability and performance.

2.Problem:
1) In AFS, validity check of cache on the client is done through stat calls to the server. This led to large number of interactions between the client and the server affecting performance.
2) Having dedicated server process for each client caused to exceed the critical resource limits and also resulted in excessive context switching overhead and virtual memory paging demands.
3) When the directory subtree was in the server, operations such as recursive directory listing took a long time.
4) File location databases were embedded in the stub directories of Vice(server) name tree. This made it difficult to move users' between the servers for load balancing.
5) File pathname traversals on the server to obtain inode caused significant number of calls to kernel which resulted in CPU overhead and affected scalability.

3.Contributions:
The authors evaluate the prototype AFS for its performance and benchmark the performance bottlenecks due to the aforementioned reasons.
The two major areas of contribution in this paper to attain scalability in AFS are changes to performance and operability.
I) Performance:
1) Rather than checking for cache validity frequently, Vice(server) notifies the Venus(client) about the file changes through the callback state shared between the both. This reduces load on server considerably.
2) Each vice file is identified by a unique fixed-length fid. This has three components namely a 32-bit volume number, 32 bit vnode number and 32 bit identifier. Files are organized in volumes, vnode is an index into the storage information of file in the volume. Fid maps pathname to inode reduces the number of calls to obtain inode.
3) User-level Lightweight Processes(LWPs) in a single process are used to handle requests from clients. Context switching in LWPs is less expensive and thus reduces its overhead. Communication is through optimized bulk transfer protocol.
4) At the client's side, the files are accessed by the local inodes rather than the pathnames. Here, local directory acts like a cache. This is optimized for avoiding calls to kernel for inode information.
II) Operability:
1) Volumes - collection of files residing in a single disk partition. Volume location database contains volume to server mappings. This seems to be an efficient way of decoupling file location information from its storage structure.
2) The above concept allows for moving of volumes across the servers. This is efficient for load balancing of disk space and utilization on the servers. Movement is done through mechanism of snapshotting, which also facilitates backing up of volumes and read-only replication.
3) Quota on volume space is alloted for each user. This is useful in a system of large users.

4.Evaluations:
The controlled experiments with synthetic benchmark have been performed on Sun2 server with the clients being IBM-RTs.
The authors answer the following questions in the system evaluation:
1) How scalability is affected due to the changes for performance?
The prototype performs 70% slower than the design implementation. Even during high load units of 20, AFS performs better than the prototype. Callbacks eliminate most of the interactions with the server. Even at a load of 20, system is not saturated and the goal of scalability(50 users) has been achieved. The most frequent calls to the server has been GetTime call which is used for synchronizing the clocks across the workstations. RemoveCB is another frequent call which is used for cache invalidation. To optimize this, the authors come up with batching the requests for cache invalidations.
2) How well does AFS perform when compared to NFS?
Comparing the performances of NFS, AFS Cold cache(cache cleared before each trial) and warm cache(cache not cleared), the authors find that NFS fails at high loads. This is due to lost RPC packets due to its unreliable protocol. But for lower loads, NFS seems to perform better(cross over point is 3-4). At load of 18, server CPU utilization in NFS saturates at 100%, 38% in cold cache and 42% in warm cache. [showing scalability]
Overall, the system has been evaluated on all parameters such as benchmark times, CPU utilization, disk utilization, comparison to another distributed file system, thus being extensive and well profiled. The authors also suggest various improvements to the system as a result of its evaluation.

5.Confusion:
Interested to know how fault handling is done in AFS.

Posted by: Sharanya Devaraj | April 7, 2016 01:27 AM

1. Summary
The authors have come up with a scalable design for AFS which offloads the server and does maximum work at the client. It relies on techniques such as cache validation to fetch the entire file and then the changes are made locally until the file is closed. They were then able to prove that their design outperforms NFS and existing implementation of AFS.

2. Problem
Existing design of AFS only cached the entire file while opening but still made a significant number of Stat calls, thereby restricting the scalability of the system. Also, they had expensive path lookups and relied on per client one process which made sharing of the data difficult. This implementation also lead to a large number of context switches.

3. Contribution
The main contribution of the authors was to first study the existing system for possible scope of improvement and then profile the steps which could be possible bottlenecks. They introduced cache management semantics in the form of callbacks which would notify Venus of changes in file. This reduced a substantial amount of unnecessary Stat calls just to check for cache consistency. They then obviated the costly path name lookups by introducing a FID which contains a pairwise entry of Volume number and Vnode, indexed by UID. Similarly, the files were cached locally which were referenced again using local inode. Therefore, both at the server and at the client, the lookup was possible in a single step. They also introduced the concept of Lightweight Processes (LWP) which enables efficient sharing.
The later section talks about operability changes which involved techniques such as volume movement - which was a two pass clone mechanism. The authors talk about implementing quotas for various users and how read-only replication can be far efficient. This can enable copy of the directory tree structure without having to consider outstanding writes during the clone phase. They also discuss how backup was an efficient operation by letting it run in the background. The volume can be marked as read-only, then asynchronously copied to the target destination. This backup mostly helps accidental deletion of the files from the users.

4. Evaluation
The authors laid the platform by listing the baseline numbers - time required for completing the benchmark on a stand-alone system, the number of calls and the most frequent type of calls (TestAuth and GetFileStat) which helped identify bottlenecks in their prototype (which was 70% worse). Their implementation then were tested for total time taken, which was 19% worse (a much better improvement); more importantly, it could support a much large number of Load Units (20+) as compared to the previous prototype (10). Even CPU and disk utilization was traced which corroborated that many more users could be supported. Later, they also compared their approach with NFS which showed similar statistics with respect to time, CPU and Disk utilization. AFS outdid NFS in all experiments. Only for workloads which have strict requirement of low latency for larger files would be better for NFS. The changes suggested for operability were however, not quantified and only qualitatively discussed. It would have been better if they could have studied the time required for volume movement and latency during such operations.

5. Confusion
I don’t understand what was the critical information which was initially required to be shared but was not possible to share (due to process based model)? Also, I would like to discuss how the files are cached in the Venus, what are the exact semantic changes.

Posted by: Vikas Goel | April 7, 2016 12:55 AM

Summary
This paper talks about the Andrew File System(AFS) - successor of ITC distributed file system, whose design is guided solely to achieve scalability and performance. AFS also improves the operability of the system too.

Problem
ITC distributed file system was designed as a distributed file system for a network of more than 5000 personal computer workstations. However, it didn't scale well as desired. Large scale affects performance and complicates system operations and hence a redesign of ITC was needed to make sure that the distributed file system scales gracefully.

Contribution
Many features of ITC is adapted for designing AFS. Few of the major features of AFS that helps in achieving scale are - whole file caching on the local disk of the client machine that is accessing a file which contradicts with other file systems that caches the blocks instead of the entire file. Cache consistency is achieved by registering callbacks rather than validating every time a file gets opened thereby decreasing the cache validation requests received by the server. The server notifies the client if the file is not valid. AFS provides weak consistency semantics where applications are responsible to handle file concurrency by using synchronization among themselves. Writes to open file is visible to only the same workstation and once a file is closed, the changes are visible to new opens. Already existing opens are not aware of the changes. Instead of having a one-to-one mapping between server process and client which is costly in terms of context switches and paging, AFS uses LWP within one process to map with the client for a single server operation. As seen in the prototype evaluation, path traversal costs are high and hence AFS introduces the notion of file identifier(fid) instead of pathnames to specify which file a client was interested in. The client will walk the pathname and the server is presented with fid. This enables the server to not perform a namei operation when accessing data. To solve operability problem at large scale, AFS introduces volume which is a collection of files forming a partial sub-tree of the Vice namespace and it can reside within a single disk partition. For load balancing, the volume can be moved from one server to another using a frozen COW snapshot of the volume called Clone. Quotas are implemented on a per volume basis. AFS also allows read-only replication to improve availability and load balancing.

Evaluation
This paper is one of the best example to explain how measurement is the key to understanding how systems work and how to improve them. The authors measured their prototype to understand their system better and used the valuable insights to redesign/improve AFS. The AFS measurement results itself speak about how useful the measurement on the prototype was. The authors have used synthetic benchmark to understand the call distribution, resource utilization on their prototype and the measurement results indicated that there is a need to reduce the frequency of cache validity checks, reduce the number of server processes, not allow the server to do pathname traversals and efficient load balancing. Post AFS, the measurements do show that scalability has improved in AFS and at a load of 20, the system is not saturated making the authors believe of achieving the scale goal of 50 users per server. Most of the servers have CPU utilization between 15-25% while the disk and network utilization is not very high and quite low respectively.
To further boost their confidence, the authors compare AFS and NFS and observe that AFS's scaling characteristic is superior to those of NFS. NFS performs slightly better than AFS at low loads but its performance degrades rapidly with increasing load. The crossover point is at a load of 3 and 4 for warm cache and cold cache case respectively. The CPU utilization saturates at 100% at a load of 18 while AFS's CPU utilization is only between 38-42% for different cache case. NFS generates roughly 3X as many packets as AFS at a load of one thereby showing higher network utilization in case of NFS. Overall, the authors by using experimental evidence rather than gut instinct have turned the process of system building into a more scientific endeavor.

Confusion
The paper does not talk about crash consistency. When the server fails, as the callbacks are maintained in memory, how does it inform the clients about changes? Does it broadcasts a message to all clients on a reboot? Can we make AFS achieve lower latency by actively caching the used blocks and then lazily cache the other blocks of the file?

Posted by: Yuvraj | April 7, 2016 12:11 AM

1. Summary: This paper presents the author’s analysis and motivation behind changing, the then existing implementation of AFS. They identify the problems in scalability and performance with the existing prototype, present their design changes and evaluate them to justify their decisions. They also compare the new AFS implementation with other DFSs like NFS.
2. Problem: One of the main motivations behind AFS was scalability. To achieve this, the central theme in AFS was to cache entire files. But the authors found that their existing implementation was only able to scale to 5-10 load units. Moreover,they identified two calls (TestAuth, GetFileStat) constituting the majority of their server traffic. They also identified performance problems due to name resolution overheads, and context switches. They also realized there system was inflexible since they were not able to distribute loads across the servers, since they used full path for file lookups. They took this opportunity to solve all these problems associated with the current AFS implementation.
3. Contribution: Through their changes, the authors were certainly able to solve performance and scalability issues. But their most important contribution was the use of “Volumes” where they’d store files. Not only did it provide flexibility, it provided them with the option of being able to distribute load across all servers. They could just change the location of Volumes, and update the server. They call this Operational Transparency. To achieve this, they use FID to uniquely identify a file, and use this to access files. This further removed the name resolution overheads in their servers. One other major contribution was the idea of Callbacks. With this, they removed the whole lot of TestAuth and GetFileStat calls, which essentially used up server resources for a basic yes/no question. They also stick with their idea of caching entire files. Realizing this was something significant, they further proposed caching directory structures and symbolic links to reduce directory traversals. This has the added advantage of easing the cache consistency implementation - they use last-writer-wins semantics as their cache consistency model. One other neat trick to reduce context switch overhead was to use threads instead of processes, thus making context switch less expensive. They also provide administrative features like user quotas to make their FS customizable, something not really provided in other FSs at the time.
4. Evaluation: The authors not only provide sufficient evidence to show the problems with their existing prototype, they exactly identify the culprit calls, and limitations which provided them with enough of a starting point. They show that server CPU usage limits their performance the most. Thus, the only way to allow servers to handle more clients is either to increase CPU performance, or remove the bottlenecks. They identify TestAuth, and GetFileStat calls as the majority of all calls, and remove them by adding Callbacks. They also show that some servers are inherently loaded more than others, and because of the inflexibility of their system, they can’t do much. After their changes, they show that the benchmark takes up only 2x more time even for 20 load units. The CPU utilization at 20 load units is 70%, compared to disk utilization of 24%, proving that CPU is still the bottleneck in scalability -- -- though there is a significant improvement in scalability. They also compare their performance with that of NFS. At 20 load units, NFS performs almost 2x worse than the new AFS, at a much higher CPU utilization. The disk traffic too, is more in NFS compared to AFS. Thus, they justify their design in comparison to other designs too.
One key property of NFS is stateless servers, and hence an easy recoverability from server crashes. This discussion of crash recoverability (server, or client) is missing in their discussion of AFS. Since, AFS server has state information in callbacks, the crash recoverability will be much more complicated. Providing some crash recovery bookkeeping would then affect their performance and scalability.
5. Confusion: Is GFS’s idea of dividing file into chunks, and caching just them inspired by AFS? I am not sure if files back then would have been of the order of GBs. Both NFS and GFS seem to have this notion that any machine in a distributed system can be a server. Why did AFS go with the idea of having a central server? Wasn’t the crash of a server (a single node) a major problem?

Posted by: Mohit | April 6, 2016 11:35 PM

1. summary
This paper examines the performance and design of the Andrew File System(AFS) focusing on scalability issues and builds a revised version of the AFS that enhances performance and day-to-day operability by improving cache management,name resolution,server process structure and storage representation.
2. Problem
Firstly,A dedicated process per client on the server leads to high context switch costs,critical resource depletion, and high virtual paging demands.Secondly,large amount of time is spent in traversing full path-names presented by workstations by using the namei operation.Thirdly,it is difficult to move users’ directories between servers.Lastly,extensive use of stat primitive severely degrades performance.
3. Contributions
A callback mechanism is used to reduces the number of cache validation requests.In this approach the server will notify a client before it allows a file to be changed.The server and venus need to store callback state information.To reduce the time spent mapping the path-name, a two level naming system was introduced.Each file is represented by a fid (volume number,vnode number,uniquifier).The fid does not store any location specific information and mapping of volume to server is present in the volume location database replicated on each server.Lightweight processes(LWP) with lower context switch costs are used as opposed to using one server process per client.LWP are bound to a client for one operation, and the number of LWPs is determined at server start-up and remains fixed.In order to avoid using namei operation, files are accessed by inodes instead of pathnames.Since the inode information is not visible at user level ,system calls have been added to access them.Volume is a collection of files in the Vice namespace. To balance disk space and utilization on servers , volumes can be redistributed by creating a frozen copy-on-write snapshot called clone , constructing a machine independent representation of clone, and regenerating at a remote site.Quotas can be used to assign volumes to users and enable volume management at user level.Read-only replications of files can be made to improve availability and balance load.Cloning can be used to backup a volume to tape and to restore it.
4. Evaluation
Using a synthetic benchmark on the original prototype it was observed that the distribution of vice calls is heavily skewed(TestAuth and GetFileStat) and that CPU utilization is not balanced across servers this motivates simplifying user movement between servers in the new version of AFS. Having used the revised AFS for one year the following two observations were made:Firstly,the system is 19% slower than standalone(prototype was 70% slower).Secondly:servers with most calls are those that store common system files or bulletin boards,the most frequent call is getTime that is used by clients to synchronize their clocks,next most frequent is FetchStatus used for listing directories, and RemoveCB to flush the cache.The new version of AFS is compared to a remote-open file system(NFS) using a cold/warm cache set.At low load ,the performance of NFS and AFS is similar.At high load,NFS performance degrades because server is contacted on each file open.Sometimes workstations terminate benchmarks prematurely due to the dependence on unreliable datagrams.On the other hand at high load AFS performance is not badly impacted due to caching and callbacks.NFS peforms better at low latency operations. The evaluation is thorough because it evaluated the existing prototype and provides empirical evidence that motivates the re-design of the system.The new version of AFS is then observed for one year and the results are analyzed. The comparison between NFS and AFS provides insight about the dramatically better performance of AFS.
5. Confusion
How is the fid uniquifier used? Read-only replication can lead to inconsistencies between replication sites when the file is updated, how is this handled ?

Posted by: shreya kamath | April 6, 2016 11:23 PM

1. summary
The AFS was revised to overcome some drawbacks in prototype. The main concepts in AFS, which caches entire files on local disk when there are file accesses from the workstations, is same while stat check mechanism, namespace policy, and process policy for communication are changed. The evaluation showed that scalability and resource utilization are improved from that of prototype and the performance and utilization in AFS outperformed NFS.

2. Problem
The prototype has several drawbacks: frequent stat check, dedicated process for a client, high CPU usage, difficulty in moving files on servers. Frequent stat check results in high traffic on network interface and hinders file transfers from servers to client. Each Venuses has their dedicated process on servers to communicate, which causes frequent context switch and consumes server resources rapidly. In addition, Venus sends information with file and directory names. It makes the servers travers their file system to find the inode of files and this operation consumes a lot of CPU resources. Furthermore, there is no method to move data to another servers even when the workload of the servers are not heavy.

3. Contributions
The main contribution, here, is the caching an entire file on local disk from the server when the file needs to be accessed by a request from the workstation. This feature is inherited from the prototype but it makes AFS different from NFS. AFS stores an entire file into local disk and subsequent accesses are executed on local disk while NFS needs to access file servers whenever NFS requests the file access.
In the problem section, we can see the issues the prototype has. In order to cope with these issues, AFS upgrades some features. First they introduced callback to reduce the traffic come from stat check. The client assume that files are up-to-date if they didn’t receive callback message from servers. Mapping pathname with inode takes place on Venus not Vice to reduce server CPU load. In addition, the vnode in fid generated by Venus does not contain the physical place of data so it is possible to move the data into another severs. The dedicated process is replaced with lightweight process on a process.

4. Evaluation
There are two evaluation sections: one for prototype and one for updated version. Prototype was evaluated to see what is good and bad. The results shows that the average cache hit ratio is about 80% for files and state caches. The main interaction between servers and client are TestAuch (stat call) and the workload are not distributed on servers evenly.
The revised AFS was evaluated with same benchmark program. The performance is 20% worse than a stand-alone workstation while 70% for prototype, and as the iteration of load execution increases, the execution time and CPU disk utilization are increased slowly not like prototype. The calls frequently requested by workstations are GetTime and FetchStatus.
The performance of revised AFS was compared with NFS. The author found out there is a fault on RPC reply packet on NFS during periods of high network activities. The results shows that the performance and CPU utilization of AFS is better than NFS. In addition, disk utilization of AFS is analogous or better than to that of NFS and the system call in AFS is 3 times lesser than NFS.

5. Confusion
What is a lightweight process here?
Why is it hard to implement AFS on distributed database?

Posted by: Choungki Song | April 6, 2016 10:55 PM

1. Summary
This article examines the design decisions of Andrew File System (AFS) which affected the scalability and operability of the system. It also discusses the motivations behind those decisions and effects of those decisions on performance.

2. Problem
The authors' goal was to develop a campus wide distributed file system in CMU which successfully emulated the semantics of the 4.2BSD file system. The campus had a large number of workstations and they wanted to build a highly scalable system which was easy to operate and maintain.

3. Contributions
The main contribution of the work which went behind this article was development of highly scalable and easy to maintain distributed file system AFS which provided open/close file consistency on a remote file which was unlike most of the popular contemporary distributed file systems. These semantics are achieved by caching the file on client whenever it opens the file and sending the updates back to the server when the file is closed. This article spends a lot of time in examining the prototype AFS which was deployed and used by around 400 users. From this long running experiment the authors present the common case operations which happen in a distributed file system over time, and found that TestAuth and GetFileStat calls formed the bulk of communication between client and the server. They also concluded that a distributed server could spend a lot of time in context switching if each client connection is handled by a different process and that if the server has to do a full path traversal for every open then the CPU will be saturated faster and that they need to optimize these operations. To alleviate these problems the authors introduced 32-bit Volume numbers which implicitly contained the inode number of the file in the server. To prevent multiple context switches they introduced Lightweight processes which are like user-level threads and added a callback mechanism using which the server could invalidate the cache of clients so that the clients do not need to do a TestAuth or FileStat operation every time a file is opened.

4. Evaluation
The intent of this paper was to examine the design decisions of AFS that affected scale and operability and keeping true to their intent the author presents quite a thorough evaluation of AFS right from its alpha version to its completion and also present a comparison the widely adopted industry standard NFS. The authors present the prototype AFS' data from its deployment and analyze it to show that the distribution of communication between client and server is highly skewed with 90% of total traffic is just for validating cache and testing validity of files in client cache. They also analyze the high CPU usage of AFS servers in working hours and conclude that it might be due to the fact that the prototype used a separate process to service each client and the server had to do a full path traversal which may result in accessing the disk several times to get the inode number of the file to open. The authors identified and fixed this issue in the later version of AFS which used Callbacks and Volume numbers to alleviate these issues. The results of these fixes were favorable as the micro-becnhmarks showed that the system could scale to 20 Load Units with 80% CPU and 23% disk usage, where each Load Unit equivalent to 5 users as per the authors' observation. And when this new version of AFS was deployed in CMU they collected more numbers which showed that the distribution to the calls to AFS servers were no longer skewed and the server could handle about 128 active users in peak periods without overloading or crashing. To ratify the scale of AFS implementation the authors ran a micro-benchmark comparing AFS to the de facto industry standard NFS system, the experiments showed that NFS did not scale well beyond 18 Load Units because CPU usage was constantly at 100% and beyond 10 Load Units the authors observed that NFS suffered from functionality breakdown because some clients were getting constantly disconnected due to high traffic load and this was abetted by use of unreliable datagrams for RPC.

All in all I think the authors have a good job in evaluating the system and proving the scalability and consistency guarantees to the readers.

5. Confusion
Why is AFS adoption lacking in industry as compared to NFS?

Posted by: Mihir Shete | April 6, 2016 09:13 PM

1. Summary
This paper mainly talks about the design decisions in Andrew File System that achieve scalability while keeping good performance and operability, including cache management, name resolution, server process structure and the concept of volumes.

2. Problem
The goal of AFS is supposed to be a distributed file system that can scale to 5000 to 10000 workstations. But their initial prototype is not satisfactory for performance or operability. Cache validation generates a lot of traffic between clients and servers. Dedicating a process per client induces excessive resource use on servers. Servers spend a great amount of time traversing path names. Embedding the file location in directory hierarchy makes it hard or impossible to do management jobs likes rebalancing or enforcing quotas.

3. Contributions
Cache consistency is achieved by registering callbacks rather than validating every time a file gets opened. AFS caches the entire file and there is no need to communicate before closing the file. Directories are also cached but modifications are updated immediately to achieve integrity. AFS also provides clear file consistency semantics.
Resolution of pathnames are done at the client side, and the server uses fid directly to find the inode and then call iopen. No implicit namei is needed. The fid does not encode location information, allowing migration to be done transparently.
AFS restructures the servers to use a pre-threaded model, reduces the resource consumption. The idea is common today but this paper may be the origin.
Volumes are the units of rebalancing, replication, backup and quota enforcing. They have the same semantics with partitions regards to mounting.

4. Evaluation
The authors used a synthetic benchmark as well as collected statistics throughout the paper to evaluate the performance of their prototype, NFS and AFS. The benchmark is designed to be representative of average usage. For each system, they measured elapsed time and the distribution, CPU utilization and disk utilization with different number of clients. Results show that the modifications to the prototype are effective and AFS achieves its goal of supporting 50 users per server. Though it may be an infrequent operation, communication between servers is not discussed.

5. Confusion
How does moving the RPC out of kernel help improve the performance?

Posted by: Xiangjin Wu | April 6, 2016 07:08 PM

1. Summary
In this paper, the authors present the drawbacks of a prototype distributed file system, Andrew File System (AFS1), designed with the aim of scalability to several thousand workstations. They present a new design, AFS2, which fixes many of the issues in AFS1 for much better scalability with high performance. They also compare AFS2 with Network File System (NFS), the state-of-the-art file distributed file system at the time.
2. Problem
The aim of the research group was to develop a distributed file system that could eventually scale to 5000 workstations. However, their first prototype, AFS1, did not have the desired scalability due to a number of inefficiencies in the design. Firstly, there was excessive interaction between workstations (Venus) and server (Vice), for example, cache validity check on every file open; this significantly increased server load. Second, AFS1 used the model of one server process per client which led to a lot of context switches on server in presence of multiple clients; this is wastage of server CPU. Third, Venus handed off a complete file path to Vice server and thus server was responsible to do the path traversal which used up a significant portion of CPU time. Thus, the server CPU became easily overloaded while communicating with small number of workstations and thus was the performance bottleneck, leading to poor scalability.
3. Contributions
A major contribution of this work is to benchmark the prototype AFS1 to figure out the design choices that limited file system performance and thus, scalability. The observations from the analysis have been mentioned above. The authors modified AFS design, though the basic architecture remained same - AFS2 also used whole-file transfers and caching at workstations. So, they have made a strong case for the benefits of these 2 techniques in designing a distributed FS. Cache management in AFS2 used the concept of callbacks where the server would track the files cached by each workstation and informs a workstation whenever any of its cached entries becomes invalid. This significantly reduces the frequency of cache validity checks done by the workstations. AFS2 offloaded the server by dumping the responsibility of doing expensive variable-length pathname resolutions on Venus and having an interface where Vice sever is presented with a fixed-length fid number. They also used “volumes”, which are groups of files that reside on same Vice server. Using fids, volumes and volume location database, file look-up on Vice server was made much faster. The authors also built a new abstraction of Lightweight Processes (LWPs) within one process which have minimal context switch cost and are thus used to efficiently service requests from Venus. AFS2 also modified low level storage mechanisms that enabled fast file access on Vice server with almost no overhead of file look-ups. Caching on workstations also allowed AFS2 to offer clean and strong consistency guarantees which support concurrent file modifications using last-writer-wins semantics. AFS2 also focussed on improving operability by using volumes that could be easily and transparently relocated between servers. It also suported other desired administrative features like user-specific quotas, user-groups and data backup for recovery.
4. Evaluation
The paper has done a very thorough job of evaluating the revised file system design. The initial benchmarking of AFS1 clearly exposes the opportunities for improvement that directly guide the design of AFS2. They compare AFS2 against AFS1 to show how the Vice servers are lightly loaded, though CPU performance is still the bottleneck. AFS2 scales well and thus meets the target of supporting 50 workstations per Vice server. They have presented a detailed distribution of calls handled by server - GetTime, FetchStatus and RemoveCB are the most frequent ones. The authors then compare AFS2 with NFS (a remote-open FS) which was considered the “standard” distributed FS at the time. They clearly demonstrate how AFS2 scales much better than NFS since the latter is outperformed as soon as the load on server exceeds mere 3 Load Units (LUs). While the server saturates completely at a load of 18 LUs in case of NFS, AFS2 server is lightly loaded at 42%. AFS2 uses significantly less disk (33%) compared to NFS (more than 95%). Further, even at smaller scales, AFS2 offers reasonable performance, though lesser than NFS2. However, AFS2 has high file-open latency that degrades fast as the file size increases. This is due to whole-file-transfer semantics. NFS has lower file-open latency which is independent of file size.
One specific aspect whose study is missing in the paper is the increase in memory consumption at the server since the Vice server maintains (a lot of?) per client state. Some concrete numbers on this, along with how the memory consumption increases with number of clients would be interesting.
5. Confusion
The paper does not explain why disk usage at server is much higher in case of NFS.
I am not very clear why it was necessary to add a new iopen system call.

Posted by: Lokesh Jindal | April 6, 2016 02:12 AM

CS 736 Reviews - Spring 2016

Scale and Performance in a Distributed File System

Comments

Post a comment