Scale and Performance in a Distributed File System
Scale and Performance in a Distributed File System. John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Stayanayanan, Robert N. Sidebotham, and Michael J. West. ACM Trans. on Computer Systems 6(1), February 1988, pp. 51-81.
Reviews due Thursday, 4/6.
Comments
Summary:
This paper examines the design and improvements to the architecture of AFS, a large scale distributed file system with scalability and performance as the main considerations. Based on the analysis and feedback obtained on a initial prototype system, the authors build a robust version of AFS by addressing the scalability, performance and operability issues.
Problem:
The initial prototype of Andrew File System
encountered performance problems. Some of which include - high load on server CPU due to File consistency checks. User of per-client processes led to high resource utilization and increased time spent on context switches on the servers. The use of full path name for identifying files led to considerable CPU overhead. Inability to share address space between processes in BSD raised challenges. Inability to enforce disk storage quotas and move user's directories between servers owing to file location databases in stub directories.
Contributions:
To address the problems encountered in the initial prototype, the authors introduced changes in the architecture of the AFS to provide better performance and scalability.
a.) Better Cache management: To reduce the number of cache validity checks, they redesigned AFS to cache the entire file along with its status in its local disk, hence removing the need to keep track of pages of the file. Modifications to cached files are done locally.
They introduced the "Callback" pattern where a server notified the client if a change has been made in a file. This significantly improves the performance by avoiding server overhead by avoiding multiple cache validity checks.
b.) Better Name Resolution: To alleviate resolving long path names on server they introduced and all path name translations were done on the client. thereby reducing the load on server.
c.) Better Server Process Structure: One process per client on a server was replaced with lightweight processes at user level whose context switching cost was an order of magnitude less compared to regular procedure call.
d.) To improve the operability they introduced facilities such as volumes which made allotting a specific quota to a user easily achievable.
Evaluation:
They ran benchmarks with synthetic workload and compare their filesystem to NFS. They first report the performance of the prototype on the synthetic benchmark. They report metrics like running time, cpu utilization and disk utilization for each system (Prototype, Revised AFS) and NFS on this benchmark test. Revised AFS uses significantly less disk compared to NFS it outperform NFS when the system grows larger. They also report the performance of the AFS in the real world over a prolonged period of time. Overall they seem to have to done a fairly thorough evaluation except for the network bandwidth usage.
Confusion:
1. Handling of concurrent changes to files in AFS was not very clear to me.
2. Can you go over Low level storage management in the revised version of AFS?
Posted by: Lokananda Dhage Munisamappa | April 6, 2017 08:20 AM
1. Summary
This paper describes Andrew, a distributed file system at Carnegie Mellon University. AFS was targeted to eventually scale to 5,000 to 10,000 nodes, and it primarily scaled by caching file data and reducing server load.
2. Problem
At the time the paper was written, there were few distributed filesystems. The ones that did exist, such as Sun NFS, sent each client read and write to a remote server, limiting the amount of scalability that the system could achieve. The researchers at CMU wanted to develop a prototype rapidly to learn design decisions for a large scale, and they wanted to incorporate this feedback into a new distributed filesystem.
3. Contributions
The AFS prototype limited network traffic and server load by caching files on each client. When the client wanted to access a file for the first time, the node communicated with a process on Vice, a set of trusted servers. Each client had a cache for file data, and a cache for file metadata. The clients needed to perform frequent checks to ensure that the local cache was up-to-date with the server.
The AFS system limited the number of cache validity checks by caching the contents of directories and symbolic links, not just the contents of files. Instead of having the client perform cache validity checks, the server itself sent callbacks to the appropriate clients whenever its copy of a file was changed. The server used a single user-level thread, rather than an entire process, to handle each request from a client.
4. Evaluation
The evaluation section is very thorough. It shows that AFS scales better than NFS to large load, and that AFS maintains relatively low CPU and disk performance as it scales. It would have been better if the evaluation section was shorter - there was too much information in the form of large tables that should have been described in a sentence or two in the text.
5. Confusion
(1) What are the consistency guarantees of this system? The paper describes some "guarantees", but they are not the same as the strong consistency guarantees or weak consistency guarantees as modern distributed systems.
(2) AFS is still used today. What are the similiarities and differences between modern AFS and the version described in this paper.
Posted by: Varun Naik | April 6, 2017 08:03 AM
1. Summary
The paper presents the authors experience and evaluation in designing the Andrew File System. It is a distributed file system with caching, that scales well. The authors motivate their solution with the analysis of a prototype and further their objective by optimizing and improving on the designed architecture. The author also compare the developed system against other major vendor supported distributed file-systems and provide an objective analysis of the differences.
2. Problem
Developing a distributed file system that can scale well (support at-least 50 users per server).
3. Contributions
The authors provide good insights on how a large scale distributed file system should behave. They also identified the main areas of concern while developing such a solution. They provide motivating studies based on their prototype, final implementation and conclude with a comparison against an industry supported solution. These studies show how to thoroughly evaluate a distributed file system.
They also introduce a caching mechanism, and base their implementation around it. They show that this reduces network traffic and allows a smoother scaling factor with increasing number of users per server.
Finally, the authors introduce us to the concept of volumes which helps them administer the system as well as provide a consistent name space for the files.
4. Evaluation
The authors provide in-depth results and analysis throughout the paper, giving good insights and motivating their solutions at every stage. I feel the paper is very well written and has a lot of take-aways for people who design distributed systems in general and file systems in particular.
Posted by: Akhil Guliani | April 6, 2017 07:44 AM
1. Summary
The paper talks about improvements in performance and operability of Andrew File system, a location-transparent distributed file system. The authors discuss the motivation behind changes in areas of cache validation, server process architecture, name translation and also demonstrate Andrew file system’s ability to scale. The performance of AFS is also compared with that of NFS.
2. Problem
The authors primarily investigate into the different problems in Andrew’s scalability. While caching files in entirety, AFS scaled to 5-10 units. Moreover, two calls - TestAuth and GetFileStat - constituted majority of the traffic. There were also performance problems associated with name translation. There was, thus, a need to fix all these in the AFS implementation.
3. Contributions
The authors make novel contributions for scalability in AFS which ultimately improves performance and operability.
On the performance side, to reduce load on server, the server notifies the client about changes in files by sharing callback state. Also, each Vice file or directory is identified by unique fixed-length id which has three components - 32 bit volume number, 32 bit Vnode number and 32 bit identifier. By virtue of this design, the actual processing of file data, given a fid, becomes efficient. Additionally, Light-weight processes (LWPs), within a process, are used to handle requests on client side. Each LWP is bound to particular client only for the duration of single server operation. Context switching across LWPs is less expensive.
Considering operability, file location information is efficiently decoupled from storage. This allows for moving of volumes across different servers and also helps in load balancing and utilization on servers.
4. Evaluation
Firstly, the paper compares modified AFS with the initial implementation. The modified version scales to 20 load units as compared to initial implementation which scaled to 5-10 load units. Secondly, AFS’s performance is compared against NFS. AFS is shown to perform better than NFS with the difference attributed to design decisions in AFS. AFS is also evaluated for two different scenarios - warm cache and cold cache.
5. Confusion
What are all the design principles in today’s distributed systems that were inspired from this paper ?
Posted by: Dastagiri Reddy Malikireddy | April 6, 2017 07:39 AM
1. Summary
This paper introduces Andrew File system, a distributed file system that is designed to support a large number of clients. It discusses the main characteristics that can make this file system scales, and uses real workloads to test its performance.
2. Problems
The main problem this paper solves is how to build a scalable distributed file system. Problems with previous prototype are: (1) One dedicated process for each client leads to large overhead. (2) All communication and manipulation between processes took place via files. (3) Too many communication is about timestamp verification between server and client. All of these limits the scalability of file system, and this paper tries to solve them.
3. Contributions
This paper makes changes to the prototype in the following aspects: (1) Modification to cached files are done locally and only write back to server when the file is closed, and server uses a callback function to notify client that cache is outdated. This reduces the communication overhead between client and server. (2) It will be an overhead if all name resolution is performed by server, so this paper uses a two-level name design and let workstation translate pathname to fid, which can help server to locate file quickly. (3) For server process structure, a limited number of LWP is created like a thread pool, and each LWP is bound to a particular client only during the single server operation.
4. Evaluation
This paper first uses a benchmark to show that the prototype system is not a scalable system. After making changes to AFS, this paper redo the benchmarks again to show the new system scales better. The benchmark changes the number of workstations (load units) and see the benchmark time, CPU, and disk utilization. AFS has higher CPU and disk utilization even with a large number of clients. This paper also compares AFS with Sun NFS, and the benchmark result shows that AFS is more scalable, has better disk and CPU utilization, and greatly reduces the network traffic.
5. Confusion
(1) In section 3.1, it says ‘if the amount of calls back state maintained by a server is excessive, it may break callback state and reclaim storage’. What does it mean to break callback and reclaim storage?
(2) Will the whole file cache policy lead to a large latency for large files? Can AFS begin to process a file when only the first few blocks of the file arrive on the workstation, like a streaming system? Or AFS can only begin processing when the whole file is cached on the workstation.
Posted by: Tianrun Li | April 6, 2017 07:28 AM
1. summary
The paper described the design and implementation of a distributed file system, Andrew File System, which targets for working on 5000 to 10,000 nodes (Scalability). AFS contains a number of server processes (Vice) and a number of client processes (Venus). The paper show how Vice and Venus collaborate with each other and are optimized for performance and scalability.
2. Problem
The originial version of AFS have some performance problems thus they would like to address those issues and design for scalability. They made a lot of observations through benchmarks. First, the clients ware verifying too much while opening/closing files, which cause unnecessary CPU utilization in server. Second, naming operation and pathname traveral is a bottleneck because of the network traffic. Third, the load could not be balanced because of the stub.
3. Contributions
a) The optimizations in cache management, name resolution, communication/process structure, and low-level storage implementation.
b) For cache management, instead of caching only the file data, the client caches directories and symbolic links. The client-side cache provides significant performance improment over remote accesses. They reduce cache validity checks by introducing callback mechanism to keep the server and local copies in synchronization. This reduces the network traffic caused by the client checking file timestamp. However it is not very scalable if many client registered callback on the server.
c) For name resolution, inodes was used to separate the name of a resource from its address. They move the name resolution from the server to the client. The client maps the pathname to a global Fid, servers use this Fid to find the data location. Thus, a fid-location mapping database needs to exist on each server.
d) For process structure, the paper implemented Lightweight Processes (thread) to reduce the context switch time, thus reducing the processing burden of the server.
e) Manageability was also improved in a later version with volumes. Volumes were subdivisions of shared file tree which could be handled individually. Volumes could be copied or moved to allow for better load balancing.
4. Evaluation
The paper provided comparision of performance with Unix-NFS, another distributed file system. NFS does not work well at high loads because the RPC protocol is based on UDP and depends simply on retrying. I'm wondering this kind of comparision is fair, given the different design goal for NFS and AFS?
5. Confusion
Not fully understand some of the consistency semantics, "Multiple workstations can perform the same operation on a file concurrently." But "Application programs have to cooperate in performing the necessary synchronization if they care about the serialization of these operations", will it break transparency for developers?
Posted by: Jing Liu | April 6, 2017 07:26 AM
1. summary
The paper described the design and implementation of a distributed file system, Andrew File System, which targets for working on 5000 to 10,000 nodes (Scalability). AFS contains a number of server processes (Vice) and a number of client processes (Venus). The paper show how Vice and Venus collaborate with each other and are optimized for performance and scalability.
2. Problem
The originial version of AFS have some performance problems thus they would like to address those issues and design for scalability. They made a lot of observations through benchmarks. First, the clients ware verifying too much while opening/closing files, which cause unnecessary CPU utilization in server. Second, naming operation and pathname traveral is a bottleneck because of the network traffic. Third, the load could not be balanced because of the stub.
3. Contributions
a) The optimizations in cache management, name resolution, communication/process structure, and low-level storage implementation.
b) For cache management, instead of caching only the file data, the client caches directories and symbolic links. The client-side cache provides significant performance improment over remote accesses. They reduce cache validity checks by introducing callback mechanism to keep the server and local copies in synchronization. This reduces the network traffic caused by the client checking file timestamp. However it is not very scalable if many client registered callback on the server.
c) For name resolution, inodes was used to separate the name of a resource from its address. They move the name resolution from the server to the client. The client maps the pathname to a global Fid, servers use this Fid to find the data location. Thus, a fid-location mapping database needs to exist on each server.
d) For process structure, the paper implemented Lightweight Processes (thread) to reduce the context switch time, thus reducing the processing burden of the server.
e) Manageability was also improved in a later version with volumes. Volumes were subdivisions of shared file tree which could be handled individually. Volumes could be copied or moved to allow for better load balancing.
4. Evaluation
The paper provided comparision of performance with Unix-NFS, another distributed file system. NFS does not work well at high loads because the RPC protocol is based on UDP and depends simply on retrying. I'm wondering this kind of comparision is fair, given the different design for NFS and AFS?
5. Confusion
Not fully understand some of the consistency semantics, "Multiple workstations can perform the same operation on a file concurrently." But "Application programs have to cooperate in performing the necessary synchronization if they care about the serialization of these operations", will it break transparency for developers?
Posted by: Jing Liu | April 6, 2017 07:25 AM
Summary
The paper identifies the bottlenecks for improvement in scalability and operability of networks running Andrew File System a distributed file system developed at Carnegie Mellon University. It also redesigns critical pieces of AFS to address these issues.
Problem
The authors first built a prototype, Andrew File System whose design had many drawbacks which caused degradation in performance, scalability and operability. A synthetic benchmark was run on the cluster and the poor performance was attributed to the following reasons :
1. Too many network calls from client workstation machines to the servers, mainly contributed by cache validity checks.
2. Having a server process per client was observed to cause too many context switches, exhaust critical resources and also cause virtual memory paging demands.
3. Servers spent a lot of CPU time on resolving pathnames to identify the actual file.
4. Uneven distribution of load across servers.
Contribution
The fundamental design of AFS includes a set of trusted servers called Vice presenting a homogeneous, location-transparent file name space to clients, both of which run on 4.2 BSD of the UNIX operating system. Interaction between server and client is achieved by a user-level process installed on client called Venus, which caches entire files from Vice, and communicates with Vice with only file operations including open and close. In this paper, several changes are made to the system to deal with the performance problems they identified. One major development were callbacks that ensured that in the steady state the server would inform the client about change in any of the files it was using, this eliminated the large number of network traffic just to revalidate cache coherence. The paper also addressed operability by introducing volumes and tying them to each user rather than mapping many users to one partition. This allowed easy enforcement of quotas as well as easy migration and read only replication using Copy On Write mechanisms. A new file locating method is introduced using the notion of two-level names. Instead of having to do the CPU-heavy operations of mapping pathname to inode, servers are now only recognizing the unique fid, which calculated from Venus given the pathname. To solve the problem of excessive server processes serving clients, and the overhead of context switching in such scenario, non-preemptive Lightweight Processes are used to share a kernel process, avoiding the overhead of context switching.
Evaluation
The paper first compares the improved design to the prototype on the basis of the synthetic benchmark which was earlier used to evaluate the prototype. Significant improvement in performance and scalability was observed.
The authors also compared the new design with NFS a popular distributed file system which was used in production systems. They are compared against metrics like benchmark execution time, cpu utilization, network traffic and disk utilization. AFS is seen to outperform NFS in all aspects. The improvements can be mainly attributed to the local caching done by AFS and the call back mechanism introduced in the improved design.
Confusion
The task of coordinating concurrent access to files is left to the application programmers to deal with. How practical is this in production systems?
Posted by: Mayur Cherukuri | April 6, 2017 04:36 AM
Summary
The paper talks about the second version of AFS implemented to resolve issues with AFS version 1 (ITC distributed FS) such as path traversal costs and too many TestAuth protocol messages. The main motivation behind the new version is scalability and easier cache consistency.
Problem
The ITC distributed FS has a few issues :
- There was costly path traversal in server side whenever there is a request for access to file from client. As client was not saving any state.
- There were too many TestAuth to server from client side whenever client wants to open the file. But in most of the cases the file was not updated on the server. This caused CPU bottleneck problem on server side, hindering the scalability.
- Problem with load balancing on server side
- Context switch cost because of the use of processes for each client.
Contributions
AFS tries to solve problems with previous version:
- The path traversal problem is solved by introducing FID(file identifier - which saves the inode of the file). This is saved as a state in client side. This is passed in subsequent fetch functions by client saving the path traversal.
- Callbacks are introduced to solve the CPU bottleneck problem, the server notifies clients which have registered callbacks on the server whenever there is an update on file on server side. Saving polling and hence too many requests to server. This helped to make the system scalable supporting more number of clients per server.
- Volumes are used for load balancing problem
- and instead of processes Lightweight Processes(LWPs) which made critical data sharing faster.
Cache Consistency :
- Files are flushed to server when closed. But when processes are on the same machine, write is reflected immediately.
- Callback mechanism solved the problems of update visibility and cache staleness(which are extant in NFS)
- Last closer wins policy
Crash Recovery
- Client Crash is handled by making the client suspect all the files for modification on server side. Hence client using TestAuth after reboot before accessing the files
- Server Crash is made known to all clients by a message by server when awaken. ( or Heartbeat mechanism can also be used between client and server), after which clients will check whether the file is modified on the server side before accessing.
Evaluations
Authors tested their new version against the prior version using a synthetic benchmark. The new version scales and performs much better. They compared against NFS. AFS performs similarly to NFS at a small number of clients and scale much better than NFS for large numbers of clients.
AFS works best for workloads where most files are not frequently shared and accessed sequentially in their entirety.
But it doesn’t work well in a few scenarios for example - appending info periodically to log (which is a large file), also in case of random updates to transaction DB.
NFS is better when file is overwritten, NFS avoids the useless reading which AFS can’t avoid, large file writes make this worse. Another scenario is when client accesses a small subset of data written to a large file. NFS works better because it works on blocks of files.
Confusions
Need more clarification on use of Volumes.
How does AFS handle non-idemopotent operations in case of failure?
“Last closer wins” policy seem to be very unfair to clients which lose their entire files, how is this handled in current distributed file systems?
Posted by: Om Jadhav | April 6, 2017 04:07 AM
Summary
This paper talks about Andrew File System which was mainly designed to provide scalability and operability. Authors first implemented a prototype and figured out the bottlenecks. Based on this, they made various improvements in the design to achieve desired scalability. Comprehensive evaluation of both prototype implementation and redesigned implementation of AFS is provided. Redesigned implementation was compared with NFS and results showed that AFS achieved much better scalability without affecting small scale performance substantially.
Problem
Current distributed file systems were not able to support a large number of clients. AFS was based on the idea of whole file caching and clients had to make sure that they are using the most recent version of the file. As as result, almost all file operations were preceded by status messages between client and server which consumed most of the server’s CPU time. Name resolution, separate server process for each client and frequent context switching also consumed most of the CPU’s time. This limited the main goal of AFS which was to provide scalability.
Contribution
The redesigned version of AFS was based on the same idea of whole file caching. Following modifications were made to achieve desired scalability:
>>Callback: Clients were allowed to register callbacks for files or directories that they were using. Server used these callbacks to inform clients whenever the file was updated. This eliminated the need to send frequent status messages and reduced network traffic considerably.
>>Name Resolution: Fid was introduced which allowed clients to map pathnames to fid. This fid was directly presented to the server and thus server no longer needed to perform pathname to inode mapping.
>>Lightweight Processes (LWPs) reduced context switching time considerably in server. Client also used LWPs to concurrently access multiple requests from users.
>>Volumes: Volumes were used to provide better operability. They could be easily moved between servers to provide load balancing. It also allowed to assign quotas for users. Read-only replication was used for files which were not be to changed by the clients.
Evaluation
Authors provided evaluation for both prototype implementation and redesigned implementation. Prototype was evaluated to figure out the bottlenecks. The redesigned version was compared against NFS. NFS performed slightly better for small load units but its performance degraded rapidly as the load was increased whereas AFS performed much better for large loads. NFS is independent of file size as all the operations are performed on the server whereas AFS incurs more latency as the size of file increases (and the file is not present in the cache).
Confusion
>>Could be please explain the concept of volumes in detail?
>>Was there any mechanism to access files which were larger than the disk cache of the client?
Posted by: Gaurav Mishra | April 6, 2017 03:45 AM
AFS – Andrew File System
1. Summary
This paper is about designing scalable distributed file system spanning 5000+ workstations supporting more than 50 active clients. The available solutions like NFS, Locus, RFS, etc. do not scale with large number of workstations and active clients as these perform network lookup on almost every file operation. The paper address these scalability challenges and propose a new distributed file system.
2. Problem
The prevailing distributed file systems suffered severe scalability issues. These issues include lack of transparent file location facility, lack of file status cache, possible corruption due to concurrent writes, flooding of network packets to servers, high CPU usage per client, etc. The authors have addressed these issues by proposing new distributed file system.
3. Contributions
The paper describes the prototype of proposed distributed file system. The prototype contains Venus (service running on client) and Vice (server hosting data). The prototype could not scale for more than 10 active clients for the benchmark. The authors improved the prototype in four major areas: Cache management, Name resolution, Server Process structure and Low-level storage representation. AFS Venus involves two types of cache – Status and Data. Also, AFS made an important deviation for file update policy – Updates will be written to server only on close and will be visible to clients only on Open. This greatly solved file update interleaving issue in concurrent update scenario. Also, as entire file is fetched on Open unlike NFS, data caching gains better hit ratio. To reduce File status network calls, AFS provides callbacks. The server notifies client for updates to cached files. As a result, needless status check of clients is avoided.
AFS file operations identify file on Vice via FID. FID has three components – Volume number, Vnode number and Uniquifier. As clients identify file via FID, frequent file name lookups are reduced greatly. AFS Vice moved from processes to threads for managing active clients. As a result, IPC and synchronization got cheaper and easier.
4. Evaluation
Authors have performed extensive performance evaluation by designing custom benchmark and comparing AFS with NFS. Authors also compared optimized AFS with its prototype. Optimized AFS is almost 3.68 time faster than prototype. For scalability tests, AFS outperformed NFS for moderate and high load conditions. NFS did slightly well for low load cases. Authors claim that this can be further improved by moving AFS services to kernel space from user space.
5. Confusion
How does AFS perform for high writes workloads? In this case, lot of callbacks will be generated.
Volume implementation is not clear.
Posted by: Rohit Damkondwar | April 6, 2017 03:36 AM
Summary:
This paper presents the design and implementation of location-transparent filesystem called Andrew file system (AFS) which was developed as a part of Andrew project (distributed computing environment project). The primary motivations behind AFS were scalability and performance. Authors built a prototype (version-1) of AFS, evaluated it, explored various issues in v1 and then solved those issues in version-2 of AFS. The paper also evaluated AFS v2 and compared it with Sun’s NFS.
Problem:
Problems with other systems: Not scalable.
Problems with AFS v1:
1. Long Pathnames: In version-1, clients used to send full pathname of file to read or write. This forced server to traverse entire path which consumed enormous amount of server resources (such as CPU).
2. Inefficient cache management: Clients sent a large number of messages (very frequently) to check whether a file has been modified or not in order to keep consistent copy of data locally. Servers spent lot of resources (such as time, CPU) in answering these requests, which could have been implemented better (as done in v2)
3. Large number of processes: AFS v1 had one process per client per server. It’s not an efficient and scalable strategy since it gets harder for server to manage such a large number of processes if number of clients increase beyond a threshold.
4. Uneven Load Balancing: Server loads were not evenly balanced.
Contributions:
Quoting Wikipedia on Andrew Project (not AFS) “It was an ambitious project for its time and resulted in an unprecedentedly vast and accessible university computing infrastructure.” I believe AFS was one of major contribution behind success of Andrew project.
1. FIDs: To solve the problems caused by long pathnames, AFSv2 introduced Fid (File ID) which consisted of volume number, file identifier and uniquifier. As a part of this mechanism, client traversed the path (and thus reducing the load on server) and cached the results.
2. Callbacks: To manage cache consistency efficiently, AFS v2 introduced callback wherein server notifies the clients if a change has been made in a file which is/was (being/) used by client(s). This way client didn’t need to send and server didn’t have to reply to numerous TestAuth messages to validate the cache.
3. Same Host File Read: A process does not need to wait for a file being written to be closed if the process modifying the file is on same host. This improves the performance by avoiding server overhead.
4. Server Process Structure: Concept of one process per client per server was removed. Provided support for non-preemptive lightweight processes at user level. An LWP is associated to a client only for duration of single server operation.
5. Low Level Storage Representation: FID table mapped FID to vnode information which identifies the inode of file storing data of a Vice file.
6. Introduced facilities such as volumes (data structuring primitive), quotas per user and read-only replication to make system more operable.
7. Crash Recovery: Client invalidates all caches when restarted (either planned or crashed). Client listens to heartbeat to ensure server is not crashed. Server sends a message to invalidate all caches to all clients if server is restarted.
Evaluation:
Authors adopt a productive approach to evaluate and design the system. They first implemented prototype (version-1), measured the useful metrics and benchmarks, identified the issues and then built another version (v2) to solve those issues. AFS v2 improved workstation performance overhead from 70% (in v1) to 19% (in v2). The paper compared AFS against Sun’s NFS and found AFS to perform better in terms of benchmark time, CPU usages and disk usages.
Confusion:
AFS adopts the policy of “last writer wins” in case two different clients are writing the same file at same time. Albeit simple, this is not a very effective strategy as first writer loses all her changes. Has this policy been changed since this paper was published? How is it managed currently?
Posted by: Rahul Singh | April 6, 2017 03:33 AM
Summary
The paper introduced Andrew File System (AFS), a location-transparent distributed file system, at the time, span a large number of workstations at Carnegie Mellon University. Now AFS is widely used across organizations, the CS department at UW-Madison uses AFS as well.
The paper explored how large scale affected performance. The authors first described a prototype they implemented and presented observations based on the prototype as motivations for the changes in the following areas of AFS
1. Cache management
2. name translation
3. communication and server process structure
4. low-level storage representation
The paper then quantitatively analyzed AFS’s ability to scale and demonstrated the importance of whole-file transfer in AFS by comparing AFS to NFS by Sun Microsystem.
Even though the main sections of the paper focus on the AFS’s performance, the paper later discussed using volume to improve the operability of AFS.
Problem
Scalability is the key characteristic when the authors were designing AFS. As the paper pointed out, large scale affects a distributed system in two ways: it degrades performance, and it complicates administration and day-to-day operation. The paper described design decisions and implementation to address both of the issues.
The evaluation on the prototype also provide many insights and problems that the improvements mentioned in this paper was addressing. Specifically, to reduce the number of server processes, require workstation rather than the servers to do pathname traversals, and balance server usage by reassigning users.
Contribution
In the prototype description, the authored described several key mechanisms which helps to understand the prototype itself and the improvement made later. Here the author first introduced the concept of Vice and Venus, where Vice is a set of trusted servers and Venus is a user-level process on each workstation, caches files from Vice and stores modified copies of files back to the servers they came from.
The mechanisms highlighted from the prototype was using a process for each client workstation to handle communications, address resolution and how Venus would intercept most of the calls to the server and how Venus would verify the timestamp of a cached file before using it.
The paper then evaluated the prototype and based on the evaluation, proposed and described the improvements for performance and operability.
There are four main improvements for performance:
1. cache management: Use call back mechanism. Venus now assumes that cache entries are valid unless otherwise notified. When a workstation caches a file or directory, the server promises to notify it before allowing a modification by any other workstation.
2. Name Resolution: introducing fid, a 96-bit (32-bit volume number, 32-bit Vnode number, 32-bit uniquifier). Each entry in a directory maps a component of a pathname to a fid. Venus performs the logical equivalent of a namei operation, and maps Vice pathnames to fids. Servers are presented with fids and are unaware of pathnames.
3. Communication and server: all clients of a server are served with a single which has many Lightweight processes (LWPs) within it. (LWPs sounds like the idea of threads)
4. Low-level storage representation.
When it comes to operability, the paper introduced volume, which is a collection of files forming a partial subtree of the Vice name spaces. Using volumes provides AFS with the ability to associate disk usage quotas with volumes and the ease with which volumes may be moved between servers, as mentioned in the paper.
Evaluation
Prototype
The paper introduced the term Load Unit referring to the load placed on a server by a single client workstation running this their synthetic benchmark.
Using the synthetic benchmark, the authors was able to observe valuable insights like the performance was bounded by server CPU that motivates the change to the system.
Improved version
The authored benchmarked the improved AFS and saw significant improvement in relative benchmark time and server utilization rate when the system size grows.
The improved system was then compared to NFS with respective to benchmark time, CPU and disk utilization rate etc. NFS and AFS trend to have relatively similar performance with the number of Load Units are smaller but AFS (both cold and warm cache version) outperform NFS when the system grows larger.
Confusion
1. Since workstations has computation power as well, why do we need servers? Can we just have a system of many workstations?
2. Can you talk more about whole file caching and its benefits?
Posted by: Yunhe Liu | April 6, 2017 02:46 AM
1. summary
This paper describes several changes and measurements to better understand the scaling of the Andrew file system.
2. Problem
As AFS systems became more widely used scaling issues started to come into play. As the system started to become more widely used at CMU scaling issues were notices with the increase in users. It was found that the large number of TestAuth and GetFileStat were the primary detractors in AFS performance with a somewhat excessive number of checks for file changes.
3. Contributions
This paper improves on several aspects of AFS to allow it to better scale with increased usage. The new version better tracks the validity of file cache contents as it can assume contents are valid for longer without checking for validity on every file open. Callbacks are established so that a client can know that a file hasn’t been changed unless the server has notified it of such. Name resolution was also improved to eliminate full path lookups for operations and change to only requiring an identifier that can uniquely identify a file without the path traversal. Server structure was also changed to allow for better scaling by changing from using a separate process for each client to a pool of server threads that can service each request as made without exhausting the resources of the system.
4. Evaluation
A large amount of evaluation was done to determine whether the changes would allow better scaling of the system. The revised system was compared to both the previous system as well as an LFS system for a point of comparison to a well-established remote file system. They were able to show that over a variety of sage cases that changes to the system offered greater improvements with increased usage as well as far better scaling opportunities over what LFS could offer. While LFS could offer mildly better performance at the low end, AFS was shown to scale far better which the designers were unconcerned with when the file system was intended for large usage cases.
5. Confusion
It’s been 5 years since I took 537 and this paper seemed to describe about what I remember of AFS from that, how different is this version from modern AFS implementations?
Posted by: Taylor Johnston | April 6, 2017 02:37 AM
Summary:
This paper presents the design and implementation of a distributed file system called Andrew File system with a main focus on achieving scalability. The development process involved initial prototyping to validate the basic architecture followed by implementation along with redesign to solve some of the issues discovered from the prototype. AFS also demonstrates consistency, security and operability.
Problem:
Previously existing distributed file systems were built to operate efficiently for small number of trusted workstations. AFS mainly tries to achieve scalability. There are challenges to be addressed with large scale systems. It degrades performance and complicates administration. Further the initial prototype of AFS showed increased CPU utilization at the server which required changes in the way cache validation is done, pathname traversals and load balancing.
Contributions:
> The main goal of this distributed file system is to provide scalability and to overcome the performance degradation problem associated with it.
> Files in the server are collectively called vice files. The file access request is interpreted by the kernel and forwarded to a user-process called venus in the workstation which is responsible to get the file from the server.
> Workstations cache the entire file once fetched on open and the subsequent read and write operation use the local cached copy. This substantially reduced the network traffic by taking advantage of locality. On close the changes made to the file is updated in the server by venus. In order to reduce the calls made to check the validity of the cached copy, callback is used. So server invalidates the workstation caches before updating a file in server.
> The load on the server is reduced by having the workstation resolve pathnames. Disk is divided into volumes and file is represented as FID which is a combination of volume id, vnode number (offset within volume) and uniquifier. Volume Location database in every server maps to volume address. This support load balancing as the volumes can now be easily moved between the servers.
> Server uses single process with multiple light weight processes to handle client request. Increases concurrency, sharing and overhead to create LWP is low.
Evaluation:
The paper does a pretty good job in evaluating. First the prototype is evaluated which brings out the pain points in the design which is server CPU utilization. After the redesign to resolve the issues in prototype the implementation was tested on the same benchmarks. Evaluated for scalability and normal operation. Server could take load of 20 workstations without saturation. Next, compared with NFS which showed serious problems at high loads.
Confusion:
I would like to understand how movement of volume using clone is performed.
Posted by: Pallavi Maheshwara Kakunje | April 6, 2017 02:29 AM
1. Summary
Andrew File System promises to provide a robust distributed file system which is location transparent, and can scale upto 5000 – 10,000 workstations. The paper describes the problems pertaining to performance and operability that are encountered when trying to achieve scalability and how Andrew File System tries to solve them.
2. Problem
The paper tries to implement a large distributed file system named Andrew File System for the Andrew distributed computing environment which might scale upto 10,000 nodes. In providing such a huge scalable File System, one will encounter many performance problems. Simple short programs might end up taking more time on distributed File Systems. The authors first implement a rudimentary prototype and explain the problems associated with it – a scalable distributed file system might result in high network overhead due to repeated file consistency checks, high CPU overhead due to namei operations and context switching, slow operations because of inability to share data between processes on AFS.
3. Contributions
The paper uses the feedback loop approach in implementing a distributed scalable file system before actually materializing the actual final Andrew File System. The authors developed a prototype for Andrew system and evaluated its performance over a certain experimental period, analysed the problems and then tried to solve them optimally in the final Andrew File System.
The paper introduces the concept of Vice, a set of trusted servers and Venus, a user-level process which servers the file system calls at the client workstation. After analysing the prototype, the authors make certain important changes to the system in the areas of performance enhancement and operability improvement. A new concept called Callback is introduced, which is a promise by the server to notify Venus whenever there is a change in the cached files. To reduce the intensive CPU overhead caused by namei operation in name resolution, AFS proposes maintaining a fixed length unique identifier called Fid for each file or directory. AFS also proposes using multiple nonpreemptive Lightweight Processes (LWPs) within one process to serve all the clients instead of creating an individual process for each client, which results in high memory overhead and complexities in data sharing.
To ease the operability and support simple relocation of partitions, consistency guarantees and backup, the paper proposes a new data structuring primitive called Volume. Balancing of disk space and utilization is simplified by the movement of volumes using the concept of Clone. Allotting a specific quota to a user is also easily achievable with the concept of Volume. For files which are rarely or never modified AFS introduces Read-Only Replication to guarantee high availability and load balancing. Backing up data is also made easy by the concept of clone and volume.
4. Evaluation
The authors create their own synthetic benchmark called Andre4w benchmark which encompasses the file system operations in a regular usage. From the onset, the paper provides detailed study (sometimes unnecessary details) of performance against these benchmarks. First the prototype is evaluated against these benchmarks and potential points of problems are identified looking at the metrics produced. Load Unit is used as the basic metric for load throughout the paper. After mitigating the problems in the prototype, then again the Andrew system is evaluated for different performance measures. The paper also provides comparison with NFS and how both these distributed file systems fare against each other and with stand-alone systems.
5. Confusion
1. Can you please explain the concept of mount points and volume in detail?
2. It would be helpful if you can elaborate on how AFS solves the problem of internal fragmentation.
Posted by: Sharath Hiremath | April 6, 2017 02:04 AM
1. Summary
This paper describes the improvements to a distributed file system used at Carnegie Mellon University to improve its scalability and performance. The authors build an initial prototype system and do a comprehensive measurement study to identify scalability and performance bottlenecks. These issues are then remedied to create the final Andrew file system.
2. Problem
The initial version of the file system was designed for scalability, but it did not deliver on that promise. This is because the clients communicated with the servers quite frequently to validate their cached copies, which resulted in an unnecessarily high load on the servers. Files were identified by their full path name which was a CPU intensive operation and hence increased CPU utilization and thus introduced a scalability bottleneck. The server also created a new process for every active client. The context switch and paging overheads impacted the performance of the file system.
3. Contributions
The paper presents a well-structured methodology to identify problems with the earlier file system and to evaluate solutions. A prototype system is built and it is studied comprehensively with real and synthetic workloads to identify problems and measure their impact. Once the authors have a clear idea about the main bottlenecks in the design, they implement solutions and evaluate their solution again using earlier methods, thus providing a common baseline for comparison. To reduce communication between clients and servers, AFS caches entire files on the client workstation from the server along with its status, and also directories and symbolic links. All accesses to the same file on the same machine are satisfied locally. The notion of callback is introduced to let the servers invalidate cached entries on clients whenever the server copy is updated. Unique identifiers called Fids are introduced which mitigate the high CPU utilization due to name resolution operations. Instead of creating new processes to service every new client connection, a fixed set of user-level threads are used, which have better performance due to lower context switch latencies. A set of system calls were added to allow accessing files by their inodes rather than traversing the tree through the path name. An LRU algorithm is run periodically to reclaim cache space on the client disk.
4. Evaluation
The improved implementation was measured again with real users as well as the microbenchmarks used earlier. The improvements helped improve scalability and the designers achieved their goal of being able to serve 50 clients with a single server. AFS was also compared to Sun's NFS which was a mature and commercial distributed file system. AFS performed much better than NFS at high loads in terms of both performance and scalability, irrespective of whether it had files cached locally to begin with or not. The evaluation section does not address the mechanism that reclaims the cache space on client disks.
5. Confusion
a. I did not quite understand what stubs are and how vnodes work.
b. Since the last writer to the server wins, it is possible that all completed modifications to a file by other clients just before this are completely lost. Why wasn't this an important consideration in the design of AFS?
Posted by: Suhas Pai | April 6, 2017 01:07 AM
1. Summary
In this paper, Howard and others from CMU present the movement of their distributed Andrew File System (AFS) from prototype towards a system with 5000-10000 nodes. Early observations from a prototype are explained to motivate changes in areas such as cache validation, server process organization, and name translation. After these areas are improved upon, the authors demonstrate how AFS' approach of whole-file transfer and caching compares favorably to NFS at scale.
2. Problem
The CMU project known at Andrew was developed to create a distributed computing environment over thousands of individual workstations. AFS was intended to provide a location-transparent file name space to all these client stations; consequently, the main problem AFS tried to solve was the issue of scalability. The prototype architecture was designed with scale in mind, but had several performance bottlenecks. Using a dedicated process for each client quickly ran into resource limits, as well as causing excessive context switching on the server. Servers also wasted CPU time by traversing full pathnames provided by clients. Furthermore, file status checks dominated client/server interactions, with only 6% of calls actually involving file transfer.
3. Contributions
The first change the authors made was to introduce Callbacks. A Callback was a promise to a client that a certain cache entry would remain valid unless otherwise notified - a drastic reduction in validation requests was seen, even at the cost of the increased complexity of tracking callback state info. This system is quite similar to coherence protocols today that minimize bus traffic when an entry is being reused by a single core.
To eliminate the server CPU wasting time (therefore limiting scalability) doing a namei operation, two-level names were introduced. (All problems can be solved via a level of indirection, right?) Venus does the logical equivalent of namei to map pathnames to fids. However, both Vice and Venus were augmented with system calls to allow access to files via their inodes, meaning that pathname lookups were nearly eliminated on both client and servers.
To address the overheads of a process/client, the authors turned to a user-level threading solution, referred to as Lightweight Processes (LWPs). Context switching is extremely fast, and an LWP is bound to a client only for the duration of a single server operation. The client can also now maintain long-term shared state within a single process without worrying about shared memory between processes.
4. Questions
The authors focused on scalability in terms of workstations/users, but what about file sizes? Choosing to transfer and cache each file entirely fit their purposes, but with their current approach a massive file that exceeds the local disk cache couldn't be accessed.
Additionally, the authors use a single benchmark in many of their analyses and comparisons, as well as comparing AFS directly to NFS, despite NFS not necessarily being designed for scalability. Certainly their design is well-informed, but it should out-perform file systems not built with the same purpose.
Posted by: Ari | April 6, 2017 12:34 AM
1. Summary
This paper presents AFS, a distributed file systems and changes over the original prototype to improve scalability.
2. Problem
The original prototype of AFS generally performs well but shows some inefficiency.
1) Each client workstation registers a process on server for communication, introducing large management and context switch overheads.
2) Servers are responsible for traversing full pathnames presented by workstations, causing high cpu utilization.
3) Current consistency mechanisms lead to frequent cache validity checks.
4) Servers usage is unbalanced, but it is hard to move data between servers online to balance server usage, with the original prototype.
3. Contributions
To solve the problems in original prototype, authors make following changes to AFS:
1) For cache management, Venus now caches directories and symbolic links to files, in addition to data. Status and data for files are kept independently and modifications to data are done locally (reflected globally on file close) and those to directories are made on servers directly. Most importantly, Callback to used to indicate that the cached data is up to date, thus reducing frequency of cache validity checks drastically.
2) For name resolution, each Vice file or directory is identified by a unique fixed length fid and servers are presented with fids for data access while being unaware of pathnames. On the low level, fid is used to get the inode information of Vice files and Vice files are accessed directly with inodes, reducing expensive namei operations. Similar technique is also applied to client side Venus process.
3) For communication between servers and clients, the new prototype uses lightweight processes instead of processes and a LWP is bound to a client only for the duration of a single server operation, reducing the number of active processes/threads and thus context switch overheads.
4. Evaluation
The evaluation is done with synthetic benchmarks. The first part of evaluation compares AFS to original version and shows that new prototype outperformed the original one. The second part compares AFS and NFS and shows that AFS is more scalable.
5. Confusion
How is fid generated?
Posted by: Anonymous | April 6, 2017 12:32 AM
Summary
This paper describes Andrew File System which is a Distributed FS focusing on scale and operability. Authors build a prototype of AFS, they let regular users use it, collects feedback, understand bottlenecks and drawbacks and then redesigns AFS to be more efficient and Scalable.
Problem
Existing distributed File Systems didn't scale well in commensurate to increase in Load unit. And also, the existing AFS prototype had similar issues. Performance degradation in prototype was not uniform across all operations; difference between CPU bound and disk bound applications are big. Certain applications used stat to obtain information regarding the files. Each of these stat calls checks for cache validity, the total number of client's interaction with server increases significantly. The use of separate process per client cause resource limitations and increased context switch times. Since BSD didn't allow sharing of memory, information sharing has to go through file in local disk. Embedding file location database in subdirectories made it difficult to move directories between servers. Authors provide a redesigned AFS to overcome these drawbacks.
Contribution
The main contributions of the paper is to 1) reduce the number of cache validity check with server 2) Reduce the number of processes spawned to serve clients 3) Reduce CPU hogging by delegating path traversals of full pathnames to workstations and 4) Workload Balance by reassigning users to different servers.
Cache Management: Caching is the key to scaling in AFS; AFS caches the entire file in its local disk and it has following advantages: i) Files are generally read in entirety; Locality of the file reference makes caching attractive. This reduces load on server and network traffic. ii) As file are cached, read and writes cause no network traffic. iii) File caching of entirely reduces the cache management; No need to keep track of pages of the file. Venus (application in the workstation which talks with the server) intercepts only open or close calls, thus cannot handle cross workstation read/writes. Instead of workstations (clients) checking with server for cache validity, now sever sends a callback to all the client (which have opened the file) notifying file update.
Name Resolution: To avoid the full traversal of the pathname, AFS uses two level naming: FID and path. Fids do not tell where the file exists (location transparency); this information is present in volume location database. Hence, this reduces CPU overhead in traversing the path.
Communication and Server Process Structure: Process to handle clients are replaced with light weight threads. Threads context switches are within one order of magnitude of a regular procedure call, thus quite faster than context switches. Client-server communication happens through RPC mechanism and this implementation is outside kernel. RPC provides exactly once semantics during no failures, support optimized whole file transfers, provides secure communication between workstations and server.
Evaluation
Authors focus on evaluating Scalability and system characteristics on normal operation. When compared to prototype, AFS's remote access time has been reduced (due to the elimination of Stat calls). AFS workstation is 19% slower than a stand-alone application which is great improvement compared to prototype which was 70% slower. Revised AFS's disk utilization is at 20% when the CPU is at 70% for 20 load uints (meaning that server CPU limits the performance). Then authors compare AFS with the Sun Microsystem's NFS for scalability. NFS at high loads of >= 10, cause workstations to terminate during benchmarking due to system errors. In considering overall performance, NFS took more time with increase in load unit (from 500 seconds at 1 to ~1300s at 18) whereas AFS started with 500 s at 1 load unit to 870 at 18 load units. Authors compare NFS and AFS against CPU utilization, Disk utilization and Benchmark Time in all of which AFS emerges as winner.
Confusion
Why were not Venus and Vice were implemented in kernel in the first place ?
We still see AFS being used in our CS labs. Is it that good or we don't want to move ? Is AFS used in industries ?
Posted by: Pradeep Kashyap Ramaswamy | April 6, 2017 12:31 AM
1. Summary
This paper looked at improvements towards the Andrew File System that would allow it to scale well to accommodate at a maximum of 10,000 nodes. The paper actually begins by first summarizing several problems noted in a prototype. This allows the authors to specifically address issues in the paper that specifically relate to real world problems and then show the benefit of their design on these problems at the end.
2. Problem
The problem of scalability of distributed file systems is addressed in this paper with benchmarking of a prototype system used to show where improvements could be made. A major problem for scalability shown in the prototype was that almost 90 percent of Vice calls were just for two calls and only 6 percent of Vice calls actually involved file transfers. The authors also noticed that CPU utilization was the bottleneck for performance due to the frequency of context switches between the many server processes and the amount of time traversing full pathnames.
3. Contributions
The major portion of this paper deals with addressing the issues that arose in the prototype testing. In order to address a large portion of the 90 percent of Vice calls the improved Andrew File System moves the status cache into virtual memory to speed up stat system calls. In order to reduce the affect of stress under load for cache validation the improved system creates a callback system. The callback system allows for Venus to wait for notifications from the server on modification of a cache entry instead of always checking the server on each open. To cut down on the CPU overhead for pathname operations the improved system uses unique Fid's to made Vice pathnames. Also the improved system got rid of having a server process for each client and instead uses a single process and has included lightweight processes that have very quick context switching.
4. Evaluation
To show that the improvements actually worked the paper compared the improved system to the prototype. There was large improvements especially in comparing the interactions to the server under load which benefited greatly from the callback system. Unfortunately/ the improved system failed to mitigate by much the CPU utilization and the CPU remains the main bottleneck of the system. The paper also compares the improved Andrew File System to the standard Remote Open file system, Sun Microsystem's NFS. NFS suffered greatly under high loads and is quickly overtaken by the Andrew File System.
5. Confusion
I was unclear as to how Venus removed callbacks from groups of files or how it determined when to do this. I would also be interested in where distributed file system have gone in the decades since this paper was published.
Posted by: Brian Guttag | April 5, 2017 11:50 PM
1.Summary
Andrew File system is a distributed file system which was built to scale well. The paper describes the initial prototype of AFS, the shortcomings of the prototype, compares it with the Sun’s NFS. The solutions like introducing the concept of callback to reduce the server and the network overhead, having LWP’s instead of processes take care of each client, introducing volumes, accessing the file using the inodes etc are proposed to overcome the shortcomings of the initial prototype and finally the performance of AFS is evaluated and compared with NFS.
2. Problem
Every open file(Vice file) on the workstation needed the cache to be validated which increased the number of calls made to the server and increased the server workload. Since each client was bound to a separate server process, the number of clients that could be served was limited. Directory lookups took lots of time as the server was doing this task rather than individual workstations themselves. Quota could not be assigned to each user limiting the resource usage and also relocating the user’s from one server to another was a tedious process.
3. Contribution
Contributions were made to improve the performance when the system scales and also to improve the operability. Changes made to improve the performance were: Caching the contents of the directories and the symbolic links and servicing the stat calls in the workstations itself rather than going to the server. Introduced the concept of callback for each file where a server would notify all the clients having the copy of the file when that particular file is modified by any clients, thus reducing the network traffic and the server overhead to a great extent. To overcome the overhead introduced by the iname calls, they introduced two level naming, every Vice pathnames are mapped to fid, which consists of volume number, vnode number and a uniquifier.
Single server process handles all the client request. Each client request is bound to a LWP till the request is completed. The files are accessed directly using their inodes rather than using their pathnames.
To improve the operability, the AFS was redesigned to use volumes. A volume is a collection of files which can be assigned to each user and can be mounted. Each volume could have a quota which would be defined by the system administrator. Read-only replications were introduced to share the system software which will normally be not altered by any clients. A clone of the read-only backup of the user’s file is kept in their home directory, so that the user’s could reclaim them within 24 hour period.
4. Evaluation
The AFS has been evaluated against SUN’s NFS. The NFS was built for the small number of workstations networking with each other and hence NFS fails to scale well. On the other hand, the AFS performs pretty well on the higher load, proving that it could scale really well compared to NFS.
5. Confusion
Please talk about the how is fault tolerance achieved in AFS.
Posted by: Sowrabha Horatti Gopal | April 5, 2017 11:34 PM
1) Summary
The authors discuss the design choices of their distributed filesystem, AFS, which has the primary goal of scalability. They identify and fix several problems in their prototype, including excessive communication and having the servers do too much work or keep too much state on behalf of clients.
2) Problem
With the popularization of individual workstation, a common paradign became to have a single central filesystem shared by a large number of relatively weak workstations. The quest to support ever-increasing numbers of workstations led to the need for these filesystems to be highly robust even under load from very large numbers of users.
Previous solutions such as NFS did not handle high loads gracefully. For example, NFS did not handle dropped packets gracefully, leading to correctness issues during times of high traffic.
3) Contributions
One contribution of the authors is the AFS itself, which is used by many universities today. However, it seems like an even more important contribution is the design principles that the authors discovered (I don't know if they were the first to find these, though).
First, the authors show that in order for systems to be scalable, the clients have to do as much work as possible. They find that increasing the amount of caching on the client side significantly decreases the amount of needless work the servers do, increasing the possible peak capacity.
Second, keep as little client state as possible in the servers. If the servers' state includes client state, then the capacity of the system cannot scale linearly with the number of clients. In their prototype, the authors create a new server process for each client. They find, however, that this wastes server resources and switch to a threading model which is more similar to what modern systems use. In fact, one of the driving paradigms on the Internet today (REST) follows this principle.
Third, limit the amount of communication between clients and servers by making the system asynchronous. The authors find that even despite their caching scheme, their prototype induces a lot of extra traffic and computation because the clients keep contacting the servers to validate their cache contents. By making the servers instead notify the clients of invalidations, they increase the capacity of the system by eliminating needless chatter. The same technique is often used in modern concurrent systems.
Finally, don't tie critical system state to particular locations. The authors find that moving information from one server to another is difficult because that information is expected to live there by the whole system. This makes tasks like load balancing and failure handling difficult, thus affecting the scalability of the system.
4) Evaluation
The paper does a good job of showing how its design choices affect the main design goal of scalability. Moreover, the authors do a very thorough analysis of the bottlenecks in their system. This makes their later design decisions easier and more intuitive to understand. It also helps to highlight the design principles that the authors learn. Finally, the authors do a comparison with another well-known system, NFS, which places their work in the context of the broader work in the field.
However, the paper itself is a rather difficult to read through. There are far too many tables, many of which are redundant with graphs of the same data. This simply bloats the paper and makes it easy to lose your place while flipping through pages or trying to find the correct figure. All of the raw data is not particularly useful for understanding the paper; it would have been better as an appendix. Moreover, the paper is full of useless implementation details that obscure the design principles the authors wish to highlight. Between the graphs and the implementation details, its pretty easy to be overwhelmed by a flood of information that hides important systems design principles. In my opinion, the authors should have focused instead on the most important lessons they learned, which are relavent to the community at large.
Finally, the paper does a pretty poor job of discussing what consistency guarantees the system gives. I believe that it is simply "Last-write-wins", but the paper does not give a precise definition.
5) Confusion
What is the consistency model here? What does the system guarantee? Along the same lines, does the system guarantee anything about handling of file server failures? Or was all of this before the time when consistency and fault-tolerance started to be studied?
Posted by: Mark Mansi | April 5, 2017 11:29 PM
1. summary
This paper presents a new version of Andrew File System. It first studies the drawbacks of the old version by experiments, identifies the bottlenecks and modifies the cache management, pathname resolution and server process to improve the scalability of AFS.
2. Problem
The problem this paper are tackling is the scalability of Andrew File System. Before this paper, there is already a first edition AFS. However, it is hard to scale. A measurement on the old AFS shows that TestAuth/GetFileStat dominate the client/server interactions (90%) while only 6% calls to Vice actually involved file transfer and there is high CPU utilization because of frequency of context switches between many server processes and traversing full pathnames presented by workstations
3. Contributions
This paper builds a revised version of the Andrew File System with modifications on the following four aspects,
- Cache management (callback)
The new version AFS use callback mechanism. The client registers a callback in the server for a file it opens, and the server will notify the client when the file is updated by other clients.
- Name resolution
AFS maps pathnames to fids and fetchs files using fids.
- Communication and server process structure
using a single process to service all clients of a server. multiple nonpreemptive lightweight processes (LWPs) within one process.
- Low-level storage representation
The authors add an appropriate set of system calls. The vnode information for a Vice file identifies the inode of the file storing its data. For efficiency, a local directory on the workstation is used as the cache.
4. Evaluation
- scalability. The authors repeat the experiments performed on the prototype. The results show the degradation in performance in new AFS is alleviated compared with the old one. At a load of 20, the relative benchmark time is less than twice as long at a a load of 1. The server utilization is around 20%. At a load of 20, the system is still out saturated.
- Gneral Obersevation. The authors measure the CPU and disk utilizaiton, calucate the distribution of calls to Andrew servers, count active users on andrew serversfrom.
5. Confusion
Why does Moving files from one server to another not invalidate the contents of directories cached on workstations? How does the Volume Location Database work exactly?
Do not know what conclusion the authors want to draw from Section 4.2.
Posted by: Huayu Zhang | April 5, 2017 10:19 PM
1. Summary
Howard et al. detail their exploration with distributed file systems through their research prototype Andrew. They discuss the shortcomings of the initial prototype and, using a benchmark, evaluate the system for its bottlenecks. Identifying the CPU was the bottleneck, some parts of Andrew were re-architected to enable sharing of important structures as well as limit the performance influence of namei. The implementation of callbacks is a significant contribution showing it is a viable approach. Ultimately, the semantics of AFS lent itself to scale reasonably well with the performance improvements.
2. Problem
The initial prototype of the Andrew File System failed to do several things well. The first major problem was the CPU bottleneck due to each client having its own process. Second, a lot of time was spent in the kernel due to the inflexibility of the initial file system interface. This paper discusses the reason the file system did not scale well and evaluates the usability and operability trade-offs with certain improvements.
3. Contribution
The main contribution of this paper is the systematic identification of bottlenecks and then improvement of said bottlenecks. They study the workload patterns that the system was designed for evaluating the cluster at CMU and built a benchmark to somewhat model the intended use cases.
To improve scalability, the main thing they eliminated was the need to go to the server. By changing the way files are identified and using callbacks instead of reactive checks, they moved a lot of interactions out of the critical path. Operability was improved through the design as well. The new approach allows for easy migration of files as the naming scheme is independent of server location. Quotas can easily be implemented. Backups are easy to do with COW. THis was all enabled by the Volume primitive.
I think it’s interesting how they point out the network as a future bottleneck and how it shows they paid attention to it. Another thing is the read-only replication reminds me of the leases paper where they offer longer term leases for that class of files.
4. Evaluation
Overall, the paper does a good job explaining the problem using data and then using the data to direct attention to alleviating said bottlenecks. They compare the initial prototype to both the new implementation as well as NFS. Significant improvement was shown over the old implementation. While AFS fairs well in the given benchmark, I appreciate that they recognize workloads play a major role and in this use case AFS performs better. If files are large and only small amounts are accessed, then there is significant overhead in bringing the file over the network as well as the subsequent writeback cost. There is no such guarantee the file remains in a coherent state with NFS.
5. Discussion
While they mention this, Table XIV reports the AFS numbers after the close system call but before it has become stable on the server.
Posted by: Dennis Zhou | April 5, 2017 10:15 PM
1. Summary
This paper describes the design and implementation of the second version of Andrew File System. It benchmarks the poor scale and performance of state of the art and introduces techniques such as callbacks, client side name resolution and multithreaded instead of multiprocess server design to mitigate these issues.
2. Problem
The existing version of AFS and other distributed file systems did not scale very well. This was primarily because of a very heavy weight server and too much interaction between the clients and server. There was too much traffic between client and server in NFS while in the initial version of AFS, every open operation was performed by the server which involved highly expensive directory tree traversals. Moreover, since the number of processes at the server increased linearly with the number of connected clients, there were too many context switches, the cost of which became prohibitive.
3. Contributions
The authors introduced various techniques to overcome the limitations discussed above. They introduce server callbacks to notify the client of an open file if some other client closes that file. The callback helps avoid the need for frequent polling by client to the server to check for file modifications. This not only reduces the network traffic but also helps in decreasing the load on the server. In addition to this, in AFS identifies each file by a globally unique identifier offloading the directory tree traversal to the clients. The server just uses that number to locate the i-node obviating the need to do frequent directory tree traversals. This helps substantially reduce the load on the server. The server also replaces the one process per client model with one thread per client. This helps reduce the number of context switches at the server.
4. Evaluation
The overall evaluation of this paper is quite comprehensive. They compare the new version of AFS with the original version using a benchmark and find significant performance improvement. They also compare the performance of AFS with NFS and find that at high loads AFS performs much better than NFS due to replacement of frequent polling by callbacks.
5. Confusions
What was so good about AFS that it is still being used at UW after 35 years? How different is the current version? Is it used elsewhere too?
A bit of an overview of distributed file systems would be nice.
Posted by: Hasnain Ali Pirzada | April 5, 2017 10:09 PM
Summary:
In this paper, the authors discuss the implementation of AFS prototype, its shortcomings and the implementation changes made in prototype to improve upon the performance and scalability bottlenecks.
Problem:
The paper provides a review of performance and scalability issues involved in building distributed file systems. AFS prototype performed and scaled poorly due to many reasons. Context switching between many server processes, too many validity checks made by clients on files open/close and pathnames traversal in server bottlenecked server CPU. Load balancing between servers was expensive and difficult due to stub architecture. Moving files/directories between servers would require updating stub directories in all the servers.
Contributions:
To improve upon the prototype, the authors made optimizations in four basic areas: cache management, name resolution, communication and server process structure and low-level storage representation:
1. Callback mechanism is added to reduce cache validity checks from client to server. Client registers a callback on the server which the server uses to notify the client on modification of the file.
2. Name resolution is moved from server to client. The client maps the file pathname to a globally unique identifier (Fid) and server uses this Fid to find the data location. Volume location database replicated on each server contains this location information. 3. Server processes per client are replaced by threads per client in order to reduce the context switching overhead.
4. Instead of pathnames, inodes are used to access files on server.
5. Volume data structure is introduced to improve operability.
Evaluation:
The authors did thorough evaluation of the prototype and the improved AFS implementation by running a synthetic benchmark. The initial results of prototype evaluation exposed the performance and scalability bottlenecks. The evaluation results of revised AFS prove the usability of implementation changes in prototype. They also compared AFS performance with Sun’s NFS file system to establish the importance of whole file transfer and caching. NFS caches only inodes and individual pages of a file in memory instead of caching the whole file. NFS shows much higher cpu utilization at all server loads and performs slower than AFS beyond load units of 4.
Confusion:
Why does NFS use both disks on the server whereas AFS uses only one disk?
Posted by: Neha Mittal | April 5, 2017 08:36 PM
1. Summary
The paper identifies the issues with the initial version of the Andrew File System (AFS) which result in poor scalability and discusses the design of AFSv2 which solves these issues. Some of the techniques used include callbacks to reduce validation traffic, making clients do directory walking and a multithreaded server process design.
2. Problems
The initial version of AFS (called AFSv1 from here on) was designed as a distributed file system intended to run at scale, but it did not scale as well as expected. The authors performed detailed profiling and benchmarking tests to identify the root causes behind the poor scalability. They found that server CPU load was a big contributor and a lot of time was being spent walking directories on the servers. They also found that majority of the client-server traffic was because of cache validation requests, and that the context switching costs with the multi-process server design were non-trivial.
3. Contributions
After the detailed profiling and benchmarking tests, the authors identified the key issues with the design of AFSv1 and went about redesigning their system. Some of the salient features of the new design were:
1. Callbacks for reducing client-server traffic - Vice would asynchronously notify Venuses running on clients when there were changes and thus the client cache had been invalidated.
2. Name resolution on clients - Servers would identify files simply through an FID, and the expensive directory walks would now be done on the client side. Clients will also maintain the directory structure instead of just the files as in AFSv1.
3. New server process structure - Single process instead of multiple processes (one for each client) with 5 threads, and the number of threads remain fixed. This eliminates the context switching costs associated with AFSv1.
4. Volumes & Quotas - Volumes were used as a data structuring method, and they also enabled per-user disk usage quotas for fair sharing.
5. Other optimisations - Read-only replication for executables was done to ensure that most frequently requested executables were distributed across multiple servers and did not cause hotspots.
4. Evaluation
The authors first evaluated the scalability of AFSv2 through a custom benchmark which measures the performance for various file system operations. They find that an AFSv2 workstation is only 19% slower than a standalone workstation, compared to AFSv1 which was 70% slower. They also find that AFSv2 only takes 36% longer at a load of 10 compared to a load of 1. AFSv1 takes 4 times as long. These results sufficiently demonstrate the effectiveness of the design changes in improving scalability.
They also compare the performance of AFSv2 against NFS under two different settings — warm cache and cold cache. They find that NFS performs better than AFS at low loads, but its performance degrades rapidly with increasing loads. The crossover point is a load of 3 for warm cache and 4 for cold cache. This seems to match our expectations that asynchronous callback-based cache invalidation would scale better than the NFS approach of synchronous validation.
5. Confusion
1. Could you explain how volumes work in AFS?
2. How different is the version of AFS that runs here at UWM from AFSv2? Also, what ideas from AFSv2 here were borrowed by other distributed file systems?
Posted by: Karan Bavishi | April 5, 2017 07:47 PM
1. Summary
This paper introduces Andrew file system (AFS), which is a distributed file system with the aim of scalability.
2. Problem
The main problem is that existing distributed file systems didn't scale well. File systems like NFS and first prototype of AFS didn't scale well because of too much interaction between file system clients and server. NFS client needs to contact server for almost all file system operations (alleviated by client side cache), and first prototype of AFS client needs to contact server for every open() operation. When the number of clients increase rapidly, more and more computation happens at server side, which makes server(s) easily be the bottleneck of scaling.
3. Contributions
The most important contributions are (1) to use callback (lease) from server to client, and (2) whole-file transfer. On client side, file system metadata and data are cached in memory and disk. Only when a file or its metadata is not found in client or its callback is revoked by server, the client will contact server. At a high level, callback is a method for server to send information to client actively. Previously, server only has callback for client to send information. Callback should be inherently more efficient than periodical cache validation, because in general case, interrupt is better than polling. But, for implementation, adding callback means more states need to be maintained in client and server, which introduce much complexity to handle failure. In addition, whole-file transfer is a trade-off decision for scalability. Only contacting server for open() and close() makes scalability more easily to achieve. However, there's no time bound for the availability of new file content in different clients. If client A changes file a and doesn't close file a for a long time, other clients cannot read file a's content for a long time (Maybe AFS used callback/lease with timeout and renewal, but not mentioned in this paper).
In addition, naming in AFS for each file/directory is an id (fid), which is a tuple of (volume number, vnode number, uniquifier). Previously, AFS client contacts server with file's full pathname, which caused server was busy with pathname-to-inode mapping. Actually this pathname-to-inode mapping should happen in client side, and because inode itself cannot identify a file in a file system which has multiple mounted points and servers, a fid is introduced. In addition, other features including using threads (LWPs in paper) pool on server side, vnode number be the same as inode in client local disk, help the AFS to be more efficient.
4. Evaluation
The authors used one synthesized benchmark (with MakeDir, Copy, ScanDir, ReadAll, Make) to identify the problems of AFS first prototype. The revised AFS showed much better scalability than first prototype in terms of load units (Fig 1). Better utilization of CPU and disk was also achieved (Table VII). Then, the authors used the same benchmark to compare AFS with NFS. The result showed AFS performed slightly worse than NFS under light load, but AFS performed much better than NFS when the load units were larger than 10 (Fig 3). In general, AFS also had much better utilization of CPU, disk and network than NFS under heavy load, while NFS had much better latency than AFS (Table XIV) (open file, read 1st byte and close file).
5. Confusion
1.Is NFS's fid much similar with NFS's file handle? According to my understanding, NFS's file handle for each file is a tuple of (file system volume, inode, generation).
2.What's the development history of AFS from 1980s to now?
Posted by: Cheng Su | April 5, 2017 06:03 PM
1. Summary
The Andrew File system is a distributed file system that is designed for performance and scalability. Its main features are whole-file caching, server callbacks for consistency, and a virtual volume-based management scheme.
2. Problem
Contemporary distributed file systems such as NFS do not scale well. This is due to their remote-open model which transfers single pages of files to clients as they are accessed. This model leads to a lot of communication overhead during file read and write operations. NFS also uses an unreliable RPC mechanism that fails under high load. It also has a confusing heuristic for cache consistency, where cached data pages are checked for validity against the server every 30 seconds.
3. Contributions
The authors contribute the implementation of AFS, which is still in use today (including by UW CS.)
They demonstrate why whole-file caching is good for distributed file systems, as it allows many local operations to avoid communication with the server.
They show that although callbacks require more state at the server, they are worthwhile due to the large decrease in network traffic they provide by eliminating a large number of stat calls.
They comment that at saturation, their system is still CPU and not disk bandwidth bound, so their file system still incurs significant overheads and has room for optimization.
They describe an indirect, volume-based method of resource management that allows user’s volumes to be migrated between servers in a lightweight fashion.
4. Evaluation
They first implement a simplified prototype version of AFS and run it on a smaller cluster of 100 real machines with a synthetic FS benchmark. They use their discoveries from this prototype to guide the changes and optimizations they made in the final version of AFS, many of which are listed in Contributions. One of their main observations was that 1 process per client at the server was too heavyweight to maintain at scale.
They implemented many changes and improvements and retested their new version against the prototype, again using a synthetic benchmark. The new version scales and performs much better than the prototype. They also compare against NFS. They perform similarly to NFS at a small number of clients and scale much better than NFS for large numbers of clients. They attribute this to their emphasis on reducing the communications between the client and server.
They also present some aggregate data from stat counters running on real machines in actual use.
5. Confusion
I would like to know more about the “lightweight processes” they use for each client. How are they managed?
Posted by: Mitchell Manar | April 5, 2017 11:06 AM