UW-Madison
Computer Sciences Dept.

Paper Write-ups for Survey

Student

The paper surveys distributed systems as of the 80s and tries to predict the future line of research. The main goals of the distributed systems of the 80s was to provide a price/performance advantage over the mainframes (as workstations started getting cheaper). In doing so, these early systems also focused on providing a higher degree of availability and fault tolerance compared to a centralized system.

These systems foucssed on distributing every OS functionality transparently. Systems like the Locus (I think it was around same time) were tightly coupled as opposed to the systems today. The focus was on having a distributed operating system rather than building distributed service on top of a vanilla OS. These systems mainly aimed at parallelizing the computation tasks of the users and were typically small scale. Even though data was distributed, it was not emphasized as it is today. Further these systems didnt decouple the storage and computation.

These systems focused on communication primitives and resource management. Also the systems were not very resellient to faults and fault tolerance is cited as a future research area. As I said before, the focus was on small scale systems, tightly coupled systems with a focus on computation intensive tasks. Systems today (like clusters and p2p) are loosely couple and carefully choose the service to distribute. Also systems today are highly scalable and focus a lot on fault tolerance. Data and computation are decoupled and most systems focus today is on distributing data. Data, as a resource distributes better than computation, which is probably why distributed file systems stayed on as opposed to distributed OSes. I think most of today's paradigm was mainly because of the WWW.

Student

The paper is a good discussion of the goals, key design issues and examples of distributed systems as of 1985. The goals of these systems as described in the paper are: 1. To have a global, systemwide operating system instead of each computer having its own OS. 2. Dynamic allocation of processes to various CPUs which is transparent to the user ,instead of the user having to do remote login to use another machine. 3. Transparent placement of files by the system, instead of having the user worry about the specific machine on which they are located. 4. A high degree of reliability (not losing data) and availibility (crashing of one process or processor allowing the system to proceed normally).

However, there seem to be a few assumptions (in terms of workload and environment) that bear upon the design and implementation aspects detailed in the paper. In particular, the systems seem to assume that the processors available in the system will only leave the system (in event of a crash). The design does not seem to accomodate dynamic adding of resources (CPU, disk space etc.) as is common in a mobile environment of today with laptops and other mobile devices. Also, the system design focuses only on the various aspects of allocating CPU to the workloads, thus implicity assuming that they would be mainly CPU-intensive. Thus I/O intensive and memory intensive workloads seem to have been deemed not important or not frequently used in the design of these systems.

The state of the art for distributed systems at this time was to use the client-server model with some form of RPC as the communication base. Thus the systems were not robust for applications demanding mutlicast or broadcast semantics (like video conferencing.) The name servers were straightforward (eg. the Cambridge system had a single name server) , thus limiting the scalability of the system. Resource allocation was either not handled by the system (eg in Eden, it was done by the underlying existing operating systems) or focused only the processor banks (eg Cambridge, Amoeba, V). Fault tolerance was little and was compromised in favour of performance (Cambridge, Amoeba, V) or very inefficient to use (eg checkpointing in Eden was very slow).

Student

According to Tannenbaum, the primary goal of distributed systems in 1995 was to exploit the "price/performance advantage" of microprocessors. In other words, for a given price, the aggregate computing power of many smaller systems was greater than that of a few larger systems -- so if new operating systems could enable easier (or even transparent) use of such multiple distributed resources, in theory users could enjoy more bang for the buck (or maybe just the same bang for less buck).

Secondary goals were incremental resource growth (the smaller the units, the more gradually you can scale), software simplicity through the assignment of discrete services to discrete processors, and improved fault-tolerance by allowing distributed OS components to fail independently of one another and potentially restart on functioning resources elsewhere.

One implicit assumption of the four reviewed systems, according to Tannenbaum, is that given a tradeoff between performance and reliability, users value the former somewhat more than the latter. This is reflected in design decisions, e.g. the lack of atomic file operations in the V system, and in actual user practice, e.g., the infrequent use of Eden's expensive object-checkpointing mechanism.

In terms of workload and environment, the systems are designed to fit one (or more) of three broad scenarios: a small group of federated but independent multi-user minicomputers; a collection of single-user workstations sharing common services (e.g., a fileserver); or a large pool of processors, more than one of which may be dedicated to a given user at one time.

In terms of design, the state of the art at the time, as evidenced by Amoeba, included a shift from monolithic kernels and towards microkernels, where key OS services operated either in user space, or were at least abstracted into clear objects or modules within kernel space. The use of capabilities to achieve protection was also present in two of the four systems surveyed.

Interestingly, although Tannenbaum doesn't go into great detail on the actual users of the four implemented systems he surveys, the least-advanced system in terms of design (Cambridge) appears to have usefully served the largest actual community, consisting of over 90 machines across three sites. And this without objects, capabilities, etc.

Student

One goal of distributed systems at that time was to provide a way to share computation among many processors. While distributing data among the entire system was certainly important, the distribution of computation was arguably the more interesting technical problem. This is in contrast to current times where we see somewhat of a shift, with many large distributed systems such as P2P networks more focused on distributing data versus computation. Another goal was to provide unified services and protection. For example, it was important to be able to have a system-wide service that provided files and also provided user-level protection to those files. Lastly, it was important for the distributed systems to have some amount of fault tolerance. While none of the four specific systems mentioned in the paper went too far to provide complete fault tolerance, it was a goal nonetheless.

The most basic assumption was that the nodes that comprise the distributed system were all *near* one another. That is, they would all likely be connected by a local area network as opposed to a wide-area network where they would be physically separated by hundreds of miles. Also, in distributed systems meant to share processing, the jobs submitted were expected to be batch-like jobs. While some terminals connected to the distruted environment allowed more interactive computing, most of the distributed computation system expected batch jobs.

At the time, a state of the art system had tens of nodes, and very rarely hundreds of nodes. Since fault tolerance was a young area of research at the time, any sort of added fault tolerance to the system would have been considered state of the art. Finally, the use of cryptography to provide protection within a distributed system, as in the Amoeba system, was also an advanced concept.

Student

The goals of these distributed systems was on one side to speed up big tasks for users since they could use multiple processors and on the other side to allow them to login anywhere and still be able to access their own work. I think the assumption was that the workloads could be pretty high since there was a relatively small amount of computers. Also the emphasize seems to be on fileservers.

The state of the art for that time seemed to be a 10mbit network and maybe a couple dozen 12 MHz computers. They were talking about 90 workstations for Amoeba. Also if a computer had a harddisk it was already quite something.

Student

The goals of the distributed systems at this time were many-fold. Incremental growth was one main objective of building these systems. The motivation was that by adding more computing resources in the network, the computing power could be increased proportionately. Another important consideration was making services available even if a few machines in the system were down. The goal was to achieve such reliability and availability without increasing the communication protocol overhead. One important goal was also to make services accessible transparently. For example, being able to access files transparently by name irrespective of their location in the network, while addressing issues of protection/access rights.

A few assumptions have been made regarding the type of workloads and the kinds of systems connected together by the Distributed System. Most of the systems studied operate on a network spread over a small area, typically with < 100 machines connected together. Network overheads, hence, are not considered as elaborately as they are in today's widely distributed systems (eg. a peer-to-peer system like Kazaa). For example, the V system broadcasts a query to all kernels whenever a client wants to access a service by name. By maintaining centralized services (eg. a centralized file server), the utility of these distributed systems is also targeted primarily towards low-end diskless systems (again, as in the V system). For these centralized services to work well, the assumptions that the number of computers is small again comes into picture. These systems, hence seem to be targeted towards small amounts of data movement/transactions.

The state of the art in these distributed systems revolved mainly around the aspects of reliability, security and resource management. Amoeba, for example, protected the rights part of the capabilities cryptographically to prevent user programs from manipulating them. Dynamic allocation of processors from processor pools was also new. Services are also kept continually running by multiple copies of the same service / multiple process, enabling other processes to run even if some are blocked waiting for replies. Though the systems were built for performance rather than reliability, some systems do address issues like bringing up crashed servers (Still, crashes are assumed to be infrequent). Also, the Cambridge distributed computing systems provides the notion of special files, the writes to which are atomic. Most systems also emphasize the property of statelessness of the various servers, with mechanisms to account for crashes, etc. Amoeba also managed to deliver signals/interrupts to related processes in the system.

Student

This paper surveys distributed systems as of 1985. What were the goals of these distributed systems? What were the assumptions (in terms of workload and environment) of these systems? What was the state of the art for distributed systems at this time?

The authors have presented different approaches employed in distributed systems as of 1985. The focus is on 'distributed operating systems' that ensure transparency to the user about the existence of multiple independent processors. The primary goal of these systems is to manage multiple users and processes that may simultaneously access the system and to dynamically allocate processes to different processors available in the system. Also these systems manage file placement on multiple processors and allow transparent access to remote resources by employing communication primitives. Additionally some of the other goals of these systems are reliability, fault tolerance and scalability.

The distributed systems presented in the paper have been typically designed for use as a computing resource in the university campus. The user population, although substantial, is not expected to be very large compared to the available workstations such that each user can have a dedicated workstation during his/her session period, which would also handle the short interactive user jobs. The systems were designed to work for heterogeneous machines with varying hardware, data type formats and network capacities.

Most distributed systems described in the paper are designed based on the typical client/server model with various specialized servers providing services. These systems could typically scale up to a few hundred machines, for instance the Cambridge distributed computing system was operating on 90 machines and Amoeba was configured for a collection of 24 computers. Scalability of these distributed systems across the wide-area network was still being investigated. Also, due to the low network bandwidths and high communication overheads, the data rate between different machines were on the order of few Mbps.

Student

This paper provides an overview of distributed computing mechanisms and describes four distributed computing systems as of 1985. Upon reading the paper, it seems to me that the overall goal for distributed systems in 1985 was to effectively take advantage of increasingly available and capable microprocessor-based systems to provide an acceptable alternative to a time-shared minicomputer system. While high-level distributed system goals such as transparency, incremental growth, and reliability were considered, it seems that much work was focused on working with different models that would allow a traditional operating system kernel (VMS, Unix) to operate in a distributed manner. Another common goals included simplification of experimentation with new mechanisms (for instance, making file system support in V and Amoeba user-level processes). Because of the idea that one needs to build a "truly distributed" system from scratch, much effort went into communication and and alternative kernel models, instead of effort into higher-level services. State-of-the art systems focused efforts on alternative kernel models that could be better distributed, such as the work in Amoeba and Eden on capability-based objects.

The four systems presented in the paper had similar assumptions about environment. They assumed groups of dozens of microprocessor workstations connected via fairly high-speed custom networks, such as Cambridge's 10/4MB ring. Of the four systems, only Eden was built on top of "commodity" network (ethernet) and kernel (SunOS Unix) technology - all the others were built directly on top of the hardware. All systems used their own form of connectionless client-server RPC instead of using ISO seven-layer models such as UDP/IP or TCP/IP, presumably because of performance reasons.

 
Computer Sciences | UW Home