« Arrakis: The Operating System is the Control Plane | Main | Memory Resource Management in VMware ESX Server »

Disco: running commodity operating systems on scalable multiprocessors

Disco: running commodity operating systems on scalable multiprocessors. E. Bugnion, S. Devine, K. Govil, and M. Rosenblum. ACM Trans. Comput. Syst., 15(4):412-447, 1997.

Reviews do 2/2 at 8 am.

Comments

1. Summary
Disco is a virtual machine monitor that allows multiple operating systems to run on a shared-memory multiprocessor machine. The authors show that developing a virtual machine monitor with multiple commodity operating systems allows systems to expose hardware innovations at lower development cost and with fewer bugs than developing a large monolithic operating system.

2. Problem
Hardware innovations are produced frequently, but operating systems do not adapt quickly enough to expose this new hardware. In particular, in the late 1980's, cache-coherent nonuniform memory architecture (CC-NUMA) enabled the development of scalable shared-memory multiprocessors. Operating systems for such processors required significant changes from traditional OS design, and even after release, the new systems contained instabilities. Due to these difficulties, many companies did not accommodate the new hardware in their systems at all.

3. Contributions
The authors implement Disco in roughly 13,000 lines of code. Disco runs on FLASH, a shared-memory machine in which each node contains a processor, main memory, and IO devices. Disco is a virtual machine monitor, and it virtualizes the underlying machine into a more standard interface that can be used by many operating systems. This interface includes virtual CPU's, virtual physical memory, and virtual IO devices. In addition, the operating systems can communicate through a virtual network interface. Disco can run both commodity operating systems and specialized operating systems.

4. Evaluation
The authors Disco with several instances of the IRIX operating system, and they compare this to IRIX without Disco. Since the FLASH hardware is not available at the time of writing, the authors simulate it using SimOS. They show that Disco provides a small overhead over Irix for various workloads. The only issue with the experiments is that they are all very short, executing in several seconds. The authors do not explain clearly why longer workloads should exhibit similar behavior.

5. Confusion
(1) I was confused about hardware and software TLB's. What operations does an operating system perform on a TLB? Why does Disco need a software TLB in addition to the hardware one?
(2) To what extent did the ideas from virtual machines influence the development of containers?

1. Summary
The article revisits the idea of virtual machine monitors and the problems of developing such system software in the new context of modern large-scale shared memory multiprocessors. VMMs are an additional layer of indirection between the hardware and the OS. Multiple copies of commodity OSes can be run on a single multiprocessor system. Disco, an implementation of this idea on a CC NUMA machine, is also discussed.

2. Problem

System software for scalable multiprocessor hardware systems has often trailed hardware in functionality, flexibility and reliability, as extensive changes to the complex Operating systems are required. Hence, instead of modifying existing operating systems, the idea is to reduce the gap between hardware innovations and the adaptation of
system software by adding a virtual machine monitor layer that virtualizes all the resources of the machine, exposing a more conventional interface to the OS. Hence, the approach provides a simple solution to the scalability, reliability, and NUMA management problems otherwise faced by the system software of largescale machines. Multiple OSes can run in parallel on top of the VMM with fewer risks of incompatibilities and instabilities.

3. Contributions
> Primary Contribution - The idea of applying the concept of VMMs to solve the challenges facing system software for scalable multiprocessor systems.
> Disco allows operating systems running on different virtual machines to be coupled using standard distributed-systems protocols such as TCP/IP and NFS. It also allows for efficient sharing of memory and disk resources between virtual machines.
> Fine grained resource maangement of the hardware is made use of.
> Relatively simple change to the commodity operating system can allow applications to explicitly share memory regions across virtual machine boundaries. The monitor contains a simple interface to setup these shared regions. The operating system is extended with a special virtual memory segment driver to allow processes running on multiple virtual machines to share memory. vCPUs are scheduled on physical CPUs analogous to how processes are scheduled on traditional systems.

> It makes use the system is not NUMA-aware by introducing page migration and replication and the monitor can dynamically schedule virtual process on physical processor for load balancing

> Implementation deatils Disco of dynamic page relocation and migration, code segment replication, second level of TLB, direct execution, etc.

4. Evaluation
Issues
> Execution overheads The virtualization overheads are empirically evaluated and explained via four workloads execution.
> Memory overheads due to replication of each OS, file system buffer cache, multiple file systems, etc.
> Page migration and replication techniques seem to effectively hide the non-uniform memory access architecture across all the clusters and provide 33-38% performance over commodity OS running on native Hardware sytem.
> Virtualization overhead for uniprocessor workloads ranges between 3% to 16%
> Some workloads can be run 1.7x faster on a 8 VM system than a commercial symmetric multiprocessor system.
> NUMA-ness can be completely hidden, reducing execution time by upto 37%.

5. Confusion
> What optimizations does NUMA aware system provide?
>The usage of the data structures of Disco that include the memmap and pmap during the TLB miss handler was not very clear.
> How does efficient access to some of the processor function done with load and store instructions work?

1. Summary
The article revisits the idea of virtual machine monitors and the problems of developing such system software in the new context of modern large-scale shared memory multiprocessors. VMMs are an additional layer of indirection between the hardware and the OS. Multiple copies of commodity OSes can be run on a single multiprocessor system. Disco, an implementation of this idea on a CC NUMA machine, is also discussed.

2. Problem

System software for scalable multiprocessor hardware systems that can provide better functionality, flexibility and reliability often requires extensive changes to the complex Operating systems. Hence, instead of modifying existing operating systems, the idea is to add a virtual machine monitor layer that virtualizes all the resources of the machine, exposing a more conventional interface to the OS. Hence, the approach provides a simple solution to the scalability, reliability, and NUMA management problems otherwise faced by the system software of largescale machines. Multiple OSes can run in parallel on top of the VMM with fewer risks of incompatibilities and instabilities.

3. Contributions
> The idea of applying the concept of VMMs to solve the challenges facing system software for scalable multiprocessor systems.
> Disco allows operating systems running on different virtual machines to be coupled using standard distributed-systems protocols such as TCP/IP and NFS. It also allows for efficient sharing of memory and disk resources between virtual machines.
> Fine grained resource maangement of the hardware is made use of.
> Relatively simple change to the commodity operating system can allow applications to explicitly share memory regions across virtual machine boundaries. The monitor contains a simple interface to setup these shared regions. The operating system is extended with a special virtual memory segment driver to allow processes running on multiple virtual machines to share memory. vCPUs are scheduled on physical CPUs analogous to how processes are scheduled on traditional systems.

> It makes use the system is not NUMA-aware by introducing page migration and replication and the monitor can dynamically schedule virtual process on physical processor for load balancing

> Implementation deatils Disco of dynamic page relocation and migration, code segment replication, second level of TLB, direct execution, etc.

4. Evaluation
Issues
> Execution overheads The virtualization overheads are empirically evaluated and explained via four workloads execution.
> Memory overheads due to replication of each OS, file system buffer cache, multiple file systems, etc.
> Page migration and replication techniques seem to effectively hide the non-uniform memory access architecture across all the clusters and provide 33-38% performance over commodity OS running on native Hardware sytem.
> Virtualization overhead for uniprocessor workloads ranges between 3% to 16%
> Some workloads can be run 1.7x faster on a 8 VM system than a commercial symmetric multiprocessor system.
> NUMA-ness can be completely hidden, reducing execution time by upto 37%.

5. Confusion
>The usage of the data structures of Disco that include the memmap and pmap during the TLB miss handler was not very clear.
> How does efficient access to some of the processor function done with load and store instructions work?

Summary:
The paper presents Disco which is a virtual machine monitor providing a layer of abstraction between OS and hardware. This enables running many copies of commodity operating system on shared-memory multiprocessor architecture with minimum changes to the existing operating system. Disco achieves scalability along with efficient resource sharing and lower overhead.

Problem:
The inherent complexity in the operating system code is the biggest limitation to adapt new changes quickly. With the advances in the hardware which provides a shared-memory multiprocessor architecture,there was a need for operating system to evolve and utilize them efficiently. Disco used the idea of virtual machine monitors to support this architecture efficiently and tries to reduce the previously existing overhead in the virtual monitor.

Contributions:
Disco runs multiple virtual machines on the same hardware by virtualizing all the resources of the machine. Each operating system runs on a virtual machine.
Processors: Disco emulates all instructions, memory management unit and trap architecture of the processor so that the applications could run without any modification. Disco provides efficient access to some kernel operations like enabling and disabling the interrupts can be one using the load and store instructions on special address.
Physical Memory: Disco provides an abstraction for the main memory residing in physical memory starting from the address zero.
I/O devices: Each virtual machine has its own set of I/O devices and has exclusive access to it. Disco virtualizes the disk by providing a set of virtual disks which can be mounted by any virtual machine.

Evaluation:
Disco is evaluated on a simulator of Flash machine. The basic overhead of virtualization ranges from 3% to 16%. The authors have also showed that running multiple virtual machines in the Disco architecture is faster than running on a commercial symmetric operating system. Page replacement and dynamic page migration improves the execution time by hiding the NUMAness of the system.

Confusion:
Could you please elaborate on address translation.

Summary:
The paper presents Disco which is a virtual machine monitor providing a layer of abstraction between OS and hardware. This enables running many copies of commodity operating system on shared-memory multiprocessor architecture with minimum changes to the existing operating system. Disco achieves scalability along with efficient resource sharing and lower overhead.

Problem:
The inherent complexity in the operating system code is the biggest limitation to adapt new changes quickly. With the advances in the hardware which provides a shared-memory multiprocessor architecture,there was a need for operating system to evolve and utilize them efficiently. Disco used the idea of virtual machine monitors to support this architecture efficiently and tries to reduce the previously existing overhead in the virtual monitor.

Contributions:
Disco runs multiple virtual machines on the same hardware by virtualizing all the resources of the machine. Each operating system runs on a virtual machine.
Processors: Disco emulates all instructions, memory management unit and trap architecture of the processor so that the applications could run without any modification. Disco provides efficient access to some kernel operations like enabling and disabling the interrupts can be one using the load and store instructions on special address.
Physical Memory: Disco provides an abstraction for the main memory residing in physical memory starting from the address zero.
I/O devices: Each virtual machine has its own set of I/O devices and has exclusive access to it. Disco virtualizes the disk by providing a set of virtual disks which can be mounted by any virtual machine.

Evaluation:
Disco is evaluated on a simulator of Flash machine. The basic overhead of virtualization ranges from 3% to 16%. The authors have also showed that running multiple virtual machines in the Disco architecture is faster than running on a commercial symmetric operating system. Page replacement and dynamic page migration improves the execution time by hiding the NUMAness of the system.

Confusion:
Could you please elaborate on how Disco emulates the processor.

Summary:
The paper presents Disco which is a virtual machine monitor providing a layer of abstraction between OS and hardware. This enables running many copies of commodity operating system on shared-memory multiprocessor architecture with minimum changes to the existing operating system. Disco achieves scalability along with efficient resource sharing and lower overhead.

Problem:
The inherent complexity in the operating system code is the biggest limitation to adapt new changes quickly. With the advances in the hardware which provides a shared-memory multiprocessor architecture,there was a need for operating system to evolve and utilize them efficiently. Disco used the idea of virtual machine monitors to support this architecture efficiently and tries to reduce the previously existing overhead in the virtual monitor.

Contributions:
Disco runs multiple virtual machines on the same hardware by virtualizing all the resources of the machine. Each operating system runs on a virtual machine.
Processors: Disco emulates all instructions, memory management unit and trap architecture of the processor so that the applications could run without any modification. Disco provides efficient access to some kernel operations like enabling and disabling the interrupts can be one using the load and store instructions on special address.
Physical Memory: Disco provides an abstraction for the main memory residing in physical memory starting from the address zero.
I/O devices: Each virtual machine has its own set of I/O devices and has exclusive access to it. Disco virtualizes the disk by providing a set of virtual disks which can be mounted by any virtual machine.

Evaluation:
Disco is evaluated on a simulator of Flash machine. The basic overhead of virtualization ranges from 3% to 16%. The authors have also showed that running multiple virtual machines in the Disco architecture is faster than running on a commercial symmetric operating system. Page replacement and dynamic page migration improves the execution time by hiding the NUMAness of the system.

Confusion:
Could you please elaborate on how Disco emulates the processor.

Summary :
Disco is a prototype virtual machine monitor intended to provide scalability with minimal changes in traditional operating systems. It provides a layer of abstraction between operating system and hardware. This allows multiple copies of commodity operating systems to run on modern hardware with many processors by cooperating and efficiently sharing resources, thus, provide an illusion of single custom operating system designed specifically for scalable multiprocessors.

Problem :
A lot of new and innovative hardware designs are available but current operating systems are not designed to use that hardware efficiently. Operating systems are very large and complex and hence cannot easily adapt new changes. To support scalability, that is, to run operating systems on hardware with hundreds of processors, a lot of changes must be made which is not only time consuming but also prone to bugs and errors. The cost of new design may even overcome the benefits of the innovations in hardware for many application areas. Disco also overcomes some of the overheads that existed in traditional virtual machine monitors by enhancing communication and resource sharing.

Contributions :
Disco provides a hardware abstraction layer on top of scalable shared-memory multiprocessor architecture which visualizes all the resources so that multiple copies of the operating system can run using their own set of resources with little to no complex changes in the OS.
Building on a previously existing idea of virtual machine monitors, Disco eliminates or reduces some of the overheads identified by improved communication and resource sharing.
Disco’s interface is as follows :
> Virtual CPUs provide the abstraction of a MIPS R10000 processor.
> Disco uses dynamic page migration and replication to emulate a nearly uniform memory access time memory architecture to the software.
> disco virtualizes all the I/O devices such as disks, network interfaces, periodic interrupt timers, clock, and console providing all the VMs with their own set of I/O devices.
Disco supports efficient communication via a special network interface that can handle large transfer sizes without fragmentation.
Failure is local to the virtual machine and does not affect other VMs.

Evaluation
Experiments were carried out on FLASH Machine simulator, the SimOS using various realistic workloads. The virtualization overheads ranged from 3% to 16%. This was mainly because of trap emulation of TLB reload misses. A system with eight virtual machines could run some workloads 1.7 times faster than commercial symmetric multiprocessor operating system without increased memory footprint. Page placement and dynamic page migration and replication bring down the execution time by 37%. Overall the system looks promising as overheads are not very large as compared to the benefits provided by virtual machine monitor.

Confusion :
No clear about the communication mechanism using NFS.

1. summary
The paper proposed a method of adding a layer of indirection into hardware and commodity os, virtual machine monnitor, to avoid programing a complex system for exploiting new hardware (like the shared-memory multiprocessor). They also implemented Disco, a protorype to demostrate their approch's effectiveness and modest overhead.

2. Problem
OS is complex and with the new Hardware, developing system software requires much work and may introduce unreliability. Adding a layer of indirection like VMM is a natural idea, but still some problems to solve. How to reduce overhead? How to manage resource in an indirect layer? How to share data?

3. Contributions
a). Bringing VMM back in new era for need of scabality, isolation and resource sharing !
b). Specific technical methods incorperaed like ransparent memory sharing, page sharing, page migration, communication through nfs protocol.
c). In its own opinion, support commodity os faced with new hardware.

4. Evaluation
They used SimOS, a machine simulator to evaluate Disco. Four representative wordloads was introduced for combination of different resource usage. Through the workloads, they demonstrated that the overhead of virtualization ranges from 3% to 16 %, and virtualization could also reduce overhead of other layers.
They also showed the benefits of their specific methods for more efficient VMM, transparent memory sharing and page migration and evaluated the scabality improvement provided by VMM.
Although the evaluation was mainly based on SimOS, they also ported Disco to HW and conducted some experiments of simlarity worloads.

5. Confusion
a) Details in Copy-on-write disks.
b) It may be interesting to compare KVM with Disco?

Summary: Modifying system software to adapt to rapidly changing hardware is a cumbersome task. To enable system software to interact with new hardware, with minimal modification, authors try to build an abstraction layer between hardware and system software. Authors propose a prototype Disco that runs multiple commodity OSs on scalable shared memory multiprocessors, with the use of virtual memory monitors (VMMs) as abstraction layer. Resultant design has low monitor overhead and facilitates a scalable system.

Problem: More often than not, system software do not evolve in pace with hardware. Additionally, modern OS software are complex, tightly coupled with hardware, and are hard to change. Modifying these applications is a slow and tedious task, has constraints such as maintaining backward compatibility, and has high potential to introduce newer bugs. These reasons often force companies to release a system with new hardware and new software together if hardware changes are vast and require significant changes in OS. However, this strategy is not sustainable in the longer run considering the cost of releasing system software and hardware together. To solve this problem, author build a prototype virtual memory monitor design system which can scale with rapid changes in large-scale shared memory multiprocessors, with minimal implementation efforts.

Contributions: 1. Introduces virtual memory monitor as thin abstraction software layer between hardware and system software. Has lesser issues than VMMs in 1970s.
2. Enables multiple commodity or specialized OSs to run on same hardware, and same commodity or specialized OS to run on slightly different hardware.
3. Transparent sharing to reduce memory overhead while running multiple OSs.
4. Uses copy-on-write (COW) to maintain consistency across replicated pages.
5. Implements software TLB to offset the increased TLB misses.
6. Fault Containment: Crashing of one VM doesn’t necessitate failure of other VMs.

Evaluation: System was tested on simulator in four workload configurations that represent commonplace applications of scalable servers: software development, hardware development, scientific computing, and commercial databases. Execution overhead and memory overhead of introducing a new layer were small, generally less than 16 percent. Included in this is also the performance gain of Disco by reducing the kernel load and handling it in VMM layer. Performance statistics would have been more reliable had the system run on actual hardware instead of simulator. Also, short running applications do not truly represent commonplace application of scalable computing servers.

Confusion: How does TLB miss work in Disco?

Summary

Disco addresses the problem of the lack of efficient system software to keep up with the rapid advancements in hardware, specifically large scale shared memory multiprocessors. The idea presented in Disco to solve this is to use multiple commodity operating systems coordinated by a Virtual Machine Monitor instead of building a tailor made OS for new hardware advancements.

Problem

It is often very difficult to keep up with the innovations in hardware if the aim is to develop system software solely tailored for the new hardware. This has various reasons like development delays given the size of system software, convincing large private software companies to make the shift to name a few. To solve this Disco introduces Virtual Macine Monitors which virtualize the hardware resources and presents a traditional hardware model to the Operating Systems. Doing this it faces the challenges of overhead due to virtualizing of hardware resources, resource management, sharing and communication which comes from additional instructions, additional memory requirement and difficulty in synchronizing between virtual machines.

The main challenge faced by DISO is to reduce the impact of these overheads as much as possible.

Contributions

The main contribution of this paper is the popularization of the use of virtual machine monitors and how it overcame some of the overheads associated with these traditional virtual machine monitors used in the context of virtualization by enhancing the resource sharing among these virtual machines. Other main contributions are :

1. Multi-processor is abstracted as a set of MIPS processors by Disco and the I/O devices are abstracted as virtual disks and virtualizes access to network devices.
2. Takes into consideration the NUMA architecture and replicates the code of disco on each node so that it can always be accessed locally.
3. Uses appropriate data structures to store and retrieve the state of a virtual CPU in the process of scheduling between virtual CPU’s. Provision of a supervisor mode for the virtual machine operating systems so that they could access certain segments of the address space.

Evaluation

A machine simulator called SimOS was used to evaluate the performance of Disco. This was done under different workloads. The overhead of virtualization came out to be 3% to 16% in a uniprocessor case. However, in a system with 8 virtual machines, the execution speedup is shown to be as high as 1.7 times. Hence, this is used to show that Disco achieves its goal of scaling favorably.

Confusion

It is not clear to me the way TLBs are updated using pmap and memmap.

Summary
This paper talks about a prototype called Disco that acts as a Hardware Abstraction Layer to run multiple commodity Operating System on a scalable multiprocessor. Authors call this layer Virtual Machine Monitors, which provides abstractions for host OSes to run.

Problem
System software were not capable of utilizing the advancements in hardware that were scalable and having hundreds of cores. As Operating systems were tailored for particular hardware, shipping innovative hardware with capable system software is extremely hard as it would require significant changes is OS. There were two solutions to this problem: Either build a new scalable Operating System which is time consuming or something less time consuming and adapts to the hardware changes. This led authors to use VMM sandwiched hardware and guest OSes.

Contribution
The main contributions of the authors was to provide a "thin" software layer called Virtural Machine Manager (Hypervisor) which abstracts the hardware for guest OSes by emulating processor, memory and IO. Authors extended the existing operating systems on shared multiprocessors with little change to OS. This approach has following advantages:
--> VMMs provide a flexible way to run different OSes on the same hardware.
--> Same OSes can run on different hardware with the help of VMM.
--> Resource sharing: Page of different OSes of code and data are shared to reduce the memory footprint.
--> Fault Containment: Issues inside a guest OS would be limited to the guest SO itself.
--> Abstraction of NUAMness using page placement, replication and migration.
--> Copy-on-write for replicated pages.

One of the other big contribution was the one-level-indirection of Virtual address Translation. All OSes are running in in their own "Virtual address" and each process within would use the Virtual addressing scheme provided by OS. When there is a translation miss in TLB, the VMM would ask the guest OS to perform the physical mapping (in guest OS's sense) and that address is used by VM to translate to actual machine address. This enables each processes to run on CPU

Evaluation
The point of evaluating Disco on a simulator wasn't as convincing as it would've been if run on a real hardware. Author's claim that simpler workloads were run due to the slowness of emulation isn't justified; each of the workloads ran for 2 to 13 seconds. I feel it would've been lot better if they had run for at least few minutes. Nonetheless, authors explain the four workloads they chose: Software Development Hardware Development, Scientific Computing and Commercial Database. The authors explain, in detail, the overheads and performance of these workloads.

Confusion
Did authors actually solve the problem of utilizing newer hardware than making them run multiple OSes ?

Resource allocation and management in a "thin" software layer doesn't seem convincing with only 3% to 16% of overhead.

How portable was the VMM ? With every new hardware, wouldn't VMM need to change every time ?

1. Summary
This paper talks about virtual machine monitor which is a thin layer of software lying between hardware and operating system. The prototype, called disco, combines commodity operating systems to run on a single scalable computer and thus avoids extensive changes in current operating systems which otherwise would be required to support scalability.

2. Problem
Nowadays computer hardware is advancing very rapidly. Scalable computers with hundreds of processors are available but unfortunately operating systems have not evolved by the same rate. Conventional operating systems are inadequate to support modern hardware. Extensive changes are required in the design of operating systems to take the full advantages of modern hardware. This leads to significant development cost and it is also likely to introduce instabilities and bugs.

3. Contributions
To tackle the above-mentioned problems, authors came up with the idea of combining various commodity operating systems to provide scalability instead of making huge changes in the current operating systems. Disco virtualizes all the resources of the machine, exporting a more conventional hardware interface to the operating system. Multiple virtual machines can be configured each running a commodity operating system with the processor and memory resources that the operating system can handle effectively.
Disco also provides an abstraction of main memory residing in a continuous physical address space. It uses dynamic page migration and replication to provide uniform memory access time and thus allows non-NUMA-aware operating systems to perform well.

4. Evaluation
Four different workloads that were representative typical uses of scalable compute servers were used to evaluate the system. Due to the unavailability of FLASH machines at the time of experiments, SimOS was used to evaluate Disco. The overhead of virtualization was found out to be in the range of 3% to 16% and was mainly due to the Disco trap emulation of TLB reload misses. Memory overheads, benefits of virtual machines on scalability and benefits of dynamic page migration and replication were also presented. Results gave a positive view of the system as the overheads incurred were not very high.

5. Confusion
Could you please talk about I/O device virtualization.

Summary
The paper presented Disco, a virtual machine monitor designed for FLASH multiprocessor which is a scalable cache-coherent multiprocessor. Disco also shows the efforts to try to extend modern operating systems (in the sense of 1997) to run efficiently large-scale shared-memory multiprocessors without a large implementation effort.

Problem
Back by when the paper was written, scalable machines were moving into marketplace. At the same time, the size of scalable systems also increased to configurations of tens or even hundreds of cores. At the same time, extensive modification which can be resource-intensive is required for operating system to efficiently on these scalable machines.

Contribution
The paper brought back the idea of inserting a software between the hardware and multiple virtual machines to virtualize and manage all the resources so that multiple virtual machines can co-exist on the same multiprocessor instead of modifying the operating systems extensively.
The authors also implemented the Disco prototype and tested it on real hardware and quantified the performance indexes of Disco.
The Disco prototype also carefully design the virtual physical memory translation by adding a level of address translation and other translation mechanisms.

Evaluation
The paper addressed an crucial problem of increasing efficiency of scalable machines and proposed convincing solutions. The prototype was developed and evaluated using SimOS as FLASH was not available at the time and did comprehensive experiments to evaluate Disco.
From the paper, it is also interesting to observe how some of the good ideas of computer design, for example scalable design, evolve over time and eventually making a difference in real-world and/or going into marketplace.

Confusion
1. How does the mechanism that Disco virtualize I/O devices work?
2. How does the B-Tree structure used by Disco help memory and disk sharing?

Summary

Disco’s main idea is to utilise the NUMA shared memory architecture without building separate operating system from scratch. This is achieved by adding abstraction VMM layer that sits on the top of the hardware which gives the OS the appearance of traditional, supported hardware.

Problem

Hardware innovations brings more and more processors to the system. It requires excessive amount of modifications to current operating systems in order to cope with this trend of hardware innovation and the modifications on the current operating systems also introduced compatibility issues. The authors try to solve this gap that exists between hardware and system software.

Contributions

Disco’s main goal is to abstract processors, physical memory and network I/O using a virtual machine monitor as a mechanism through which specialized Operating Systems operating in their own virtual machines access underlying hardware resources and communicate with each other using distributed systems principles.

Second contribution is two level of address translation: virtual address to physical address, and physical address to machine address. Disco uses additional structures like per virtual machine pmap table, L2 TLB and global memmap table. Virtual to machine address translation is cached in TLB to increase the efficiency. This allows Disco to provide almost UMA view to Operating Systems running on it.
Disco also implements dynamic page migration and page replication to deal with non-uniform memory access. Techniques such as Copy-on-Write are used to ensure consistency across replicated pages.

Evaluations

Authors evaluate Disco running on an emulated multi-core RISC processor system using the SimOS simulator against the IRIX OS against a variety of memory and I/O intensive workloads. They compare the virtualisation overheads for execution time and memory. They also evaluate the workload scalability and NUMA behavior of Disco. They claim that it performs reasonably compared to a traditional commodity system.

Confusions
The difference between software TLB and software reloaded TLB isn’t very clear.
I would like to know more about pmap and memmap structures.

1. Summary
The Paper introduces the concept of a Virtual Machine Monitor as a method to mitigate the application development effort and hence improve utility of new hardware platforms with exotic features.
2. Problem
Availability of new hardware and lack of current OS support of new;y developed hardware features.
3. Contributions
A deep look into the challenges faced while developing Hardware abstractions which sit sit below an Operating system.
Bringing back the Idea of VMM's into the mainstream, by providing a an ideal use case with a working solution evaluated on both a simulated and actual hardware.
4. Evaluation
Adequate demonstration of the systems ability to run a commodity OS on custom hardware without any modifications.

1. Summary
CCNUMA machines were becoming a common occurrence among companies. Unfortunately, software lagged hardware with additional problems such as compatibility. Disco tackles the problem of efficient time to market as many operating systems require significant changes to adapt to new hardware. It aims to use virtual machines as the unit of isolation as reliability is a major factor for many users.

2. Problem
Multiprocessor operating systems that ran on CCNUMA machines are heavyweight and are generally not concerned with portability. New hardware generally required the porting of multiprocessor OSs, which was a significant undertaking. Disco aims to minimize the time to market and promote virtual machines as the new unit of isolation.

3. Contribution
The primary contribution of Disco is revisiting the notion that multiprocessor operating systems require complex operating systems. They break down each component in an OS and identify advancements since the initial notion of virtual machine monitors. They use direct execution on the CPUs and propose a solution to the challenge of privileged instructions by trapping into the monitor and executing certain isntructions. Memory is handled by adding a new mapping from physical to machine and a second level tlb. NUMA is managed by a dynamic page migration and page replication system. I’d say the management of the virtualized subnet as it provides significant improvements to the memory utilization and latency of requests among machines by using the MMU to map memory into both virtual machines.

4. Evaluation
Evaluation was done using SimOS as the hardware of FLASH was not yet available. They observed 3%-16% overhead with uniprocessor workloads on IRIX. Additionally, impressive memory conservation was shown using the sharing techniques across virtual machines. Figure 7 shows a savings of up to 50% on 8 instances of pmake.

5. Confusion
I’m not sure how Disco answers the resource allocation challenge for the virtual CPU scheduling and the possibility of allocating pages not actively being used.

1. Summary
With the introduction of innovative hardware such as scalable shared-memory multiprocessors, there arises a need for significant changes in the system software in order to fully utilize the power of hardware innovation. Disco prototype uses the idea of virtual machine monitors to run multiple copies of IRIX operating system which provides lower monitor overheads and scalability.
2. Problem
Due to the impediments faced in the software development cycle, there has been a gap between hardware innovations and adaptation of system software. The hardware innovation taken into account by the paper is scalable shared-memory multiprocessors. This requires major changes in the system software which include partitioning of system into scalable units, building a single system across these units and CC-NUMA management. The changes to be made to achieve these might result in a cascading effect and have unexpected effect on standard OS modules such as virtual memory management and scheduler. It is necessary to create a generic operating system that requires only minimal changes but provides comparable performance to specialized OS.
3. Contributions
Disco prototype revives the old idea of virtual machine monitors by adding it a twist. A layer of software called the virtual machine monitor is introduced between the layer of hardware and operating system. Disco also introduces simple changes to transparently share major data structures such as program code to reduce the memory overhead caused by running multiple operating systems.
Disco abstracts away NUMA-ness of the memory system by using page placement and dynamic page migration and replication. This achieves a huge reduction in execution time. It also introduces copy-on-write to maintain consistency across replicated pages. Disco avoid using data structures with poor cache behaviours but use data structures such as pmap and memmap instead, to accelerate TLB hit rate. VMMs virtualize the processors by providing secure direct access to the underlying processors. Disco also provides a virtual networking interface to allow virtual machines to communicate with each other, at the same time avoiding replicated data wherever possible.
4. Evaluation
It was a bit ironical that the motivating problem was to keep up with the hardware innovations, but the authors had to simulate FLASH device since it was not yet available. They used SimOS to simulate the machine and evaluate the performance of Disco on it. The paper presents the results of running four typical workloads representative of applications of scalable servers – software dev, hardware dev, scientific computing and commercial database. These helped study the workload’s behavior in great detail. The paper also shows the comparison of execution overheads, memory overheads and scalability performance for the four typical applications. All these tests were performed with Disco running IRIX in a single virtual machine, which is again contradicting to the objective of Disco which was to run multiple commodity OS on a scalable multiprocessor system.
5. Confusion
1. Can you please explain what software/hardware reloaded TLB is?
2. Application of NFS protocols in achieving Virtual Network Interface wasn’t clear.

1. summary
This paper describes a virtual machine monitor system designed to simplify system software design for scalable non-uniform computer systems.
2. Problem
The problem presented the requires the Disco system is that designing software system for large nonuniform systems can be very difficult. This leads to unreliable systems being released as developers cannot fully deliver. It also points out that correctly handling NUMA memory allocation can be difficult to properly implement.
3. Contributions
To solve these issues, they present a virtual machine monitor system, Disco. The system is designed to present a virtual processor with a standard architecture to each virtual machine so that little to no changes are required to be able to run standard operating systems. Their system design allows for a standard OS to be used on top of Disco without concern for the underlying “NUMA-ness” of the system. All memory management issues associated with that are handled by Disco which can move pages around as needed to be locally accessible whenever possible. This allows designers to create specialized OSes to use on top of Disco to maximize performance for a specific task without much concern for the underlying system. Through the use of many of these layered systems the full computer system can be reliably utilized.
4. Evaluation
The system that Disco was designed for was not ready when testing was done so they used a simulation based method for the majority of their testing. Their analysis showed small to modest performance overhead by running on top of the Disco system but also shows that their system is able to improve the number of remote memory accesses that would take place on the NUMA architecture. Their analysis shows a fairly balanced picture of the system despite not being able to run on the intended hardware.
5. Confusion
I don’t have a good idea of the timeline of VMM system design going in so I was unsure of which parts of this system were actually novel at times. Many of the methods they described in their system aligned with my understanding of how VMMs generally work but I wasn’t sure if those method were arising at the time of this paper or had been around for a while at the time of writing.

1. Summary
This paper introduces a method to run multiple operating systems on processors, i.e. to use virtual machine monitor (VMM). The paper gives its solution of how to virtualize CPU, memory and I/O device among multiple OSs efficiently. Though VMM was not a new idea at that time, this paper brought it back and implemented a prototype to demonstrate VMM's efficiency. Later on, authors found VMware and put VMM idea into production.

2. Problem
The problem arises from the demand of appropriate operating system when new hardware technology becomes reality. Specifically, shared-memory multiprocessor is shipped, however operating systems haven't been developed to accommodate to the new hardware. How to run current commodity operating systems on scalable multiprocessors with little implementation effort? Rewriting current OSs would be time consuming. The authors decided to use VMM to run between guest OSs and hardware. This requires little change to guest OSs.

The second problem is how to run OSs on multiprocessors in an efficient way. Traditional VMMs have non-negligible overhead both in CPU time and memory space. The authors introduce several ideas such as second level TLB, page migration and replication (though this is only targeted to NUMA memory), copy-on-write disks, and virtual network interface to tackle the efficiency problem.

3. Contributions
The main contribution of this paper is giving a design of virtual machine monitor in a thorough way, including CPU, memory and I/O devices.

For CPU, application still runs in a limited direct execution way without accessing to privileged instructions (user mode). VMM runs in kernel mode with accessing to all privileged instructions and memory (kernel mode). However, guest OS should work between VMM and application, while guest OS cannot run privileged instructions, but it needs some protected memory for its code and data structure. The authors exploit MIPS processor to put guest OS in one intermediate mode (supervisor mode). Here, put guest OS and VMM in different hardware privilege level is an elegant way to solve protection problem, because little software implementation is needed.

For memory, application still runs with virtual address, and guest OS still runs with physical address (in its sense). VMM runs in machine address (real physical address). VMM maintains TLB with translation between virtual address and machine address. When there's a TLB miss in application execution, VMM first lets OS to get virtual-to-physical address mapping, then it gets physical-to-machine mapping, and updates TLB with virtual-to-machine mapping. This design allows guest OS retain its own memory management strategy. In addition, the authors introduce a second-level software TLB to speed up TLB miss handling. This one level of physical-to-machine indirection frees VMM from changing any guest OS memory management code, and it also decreases the possibility of bugs in VMM.

For I/O devices such as disks and network, they can be virtually shared between virtual machines (guest OSs) with copy-on-write mappings. For read-only data, VMM maps the same physical memory into guest OSs' DMA virtual memory. In this way, OSs can share the same data such as code and buffer cache to decrease the burden of memory overhead.

4. Evaluation
This paper implemented a prototype (Disco) based on SimOS. The authors chose 4 kinds of workloads for scalable compute servers including software development, hardware development, scientific computing and commercial database. They measured the execution overheads, memory overheads, scalability and the effectiveness of page migration and replication. The result is promising as execution overheads are no more than 16%, and memory overheads are also small.

5. Confusion
1. Could you talk a bit more about how copy-on-write disks and virtual network interface (4.2.5 and 4.2.6) actually work?
2. What are current research or industry problems for virtual machine monitor?

1. Summary
The paper describes a virtual machine monitor that can be implemented with small effort to run multiple existing operating systems on large-scale shared-memory multiprocessors without having to modify them significantly. The monitor provides low overheads while dealing with NUMA behavior transparently and also reduces memory overheads by avoiding replication of major operating system data structures such as the file system buffer cache.

2. Problem
Developers of innovative hardware such as large-scale shared memory multiprocessors face challenges in obtaining operating system support for their hardware. This is because system software is usually a large and complicated codebase, and hence modifying it involves enormous development cost. Such an effort can also introduce incompatibilities and instabilities which may undermine their success.

3. Contributions
The paper proposes solving the problem by adding a level of indirection between existing operating systems and the innovative hardware. The Disco virtual machine monitor which is a light-weight and easy to modify software, is developed for this purpose. The VMM uses several techniques such as executing non-privileged instructions directly on the CPU, using software-reloaded TLBs to translate memory accesses with no overhead, using non-trapping load and store instructions to special pages for privileged register accesses, and page migration and page replication to hide the NUMA-ness of the underlying hardware to keep overheads low. This approach does not require the operating systems to be tweaked much apart from changes to their hardware abstraction layer (HAL).

4. Evaluation
Disco is evaluated using workloads with different needs and the performance overhead is shown to be less than 16%. The benefits of NUMA-awareness and scalability of Disco are shown to even outweigh the virtualization overheads in some cases. But the workloads used for evaluation were very short. It is not immediately clear if these benefits will persist when the system runs continuously for a long time.

5. Confusion
I did not clearly understand how Disco virtualizes I/O devices.

1. Summary
This paper introduces Disco, a NUMA aware virtual machine monitor designed to run conventional OS which were not scalable, on scalable shared memory multiprocessors while minimizing the overhead of virtualization through transparent resource sharing (especially memory and disk) among the virtual machines.

2. Problem
Scalable shared memory multiprocessors require OS to be NUMA aware to deliver good performance and fault tolerance. Conventional OS were not NUMA aware and required significant engineering to be scalable. Virtual machine monitor could be a solution but were not efficient in terms of resource management, communication and sharing.

3. Contributions
In processor virtualization, Disco emulates all MIPS privileged instructions and trap mechanisms, enabling existing OS to run on Disco without modification and extended architecture to let frequent kernel operations be performed with loads and stores to special addresses to reduce trap overheads.
In memory virtualization, Disco implemented two levels of address translation: virtual address to physical address, and physical address to machine address. Disco implemented the address translation system with additional structures like per virtual machine pmap table and L2 TLB, and global memmap table. For efficient address translation, virtual to machine address translation is cached in TLB directly. Based on efficient address translation, Disco uses dynamic page migration and replication to provide a almost UMA view to OSes running on Disco.
In device virtualization for disks, Disco leverages the interception at every DMA access and copy-on-write strategy for non-persistent storage to share data among virtual machines, while disk writes are kept private for isolation. Virtual stations communicate with each other through virtual network interface and Disco designed the virtual subnet and networking interface which improved data sharing among virtual stations.

4. Evaluation
In this paper, authors evaluated Disco with simulators, and used four kinds of representative workloads. Simulation showed that the execution overhead of Disco ranged from 3% to 16% and was mainly due to trap emulation of privileged instructions. In terms of memory overheads, Disco’s memory sharing design significantly reduced memory footprint especially for many partitions. In terms of scalability, Disco significantly reduces the execution time with many virtual machines working simultaneously, compared with conventional OS.

5. Confusion
- I’m not quite clear about the 2 step procedure that two virtual stations communicate with each other with NFS in Figure 5
- What is HAL? Do we have HAL in today’s OS?

Summary:
This paper presents Disco, a virtual machine monitor that provides a layer of abstraction between hardware and operating system. Disco enables running commodity operating systems on scalable shared-memory multiprocessors without much changes to OS and applications.

Problem:
Modern operating systems require substantial changes to efficiently support scalable machines. But considering the size and complexity of OS code, it is a resource-intensive task to modify OS. This leads to software trailing behind hardware development. Also, frequent modifications to OS code leads to buggy system software resulting in overall system failures.

Contributions:
Disco virtualizes all the resources of the scalable machine and exports a more conventional hardware interface to the operating system. Disco enables multiple operating systems to run on the same multiprocessor. Legacy applications can run on top of these operating systems without any modification. Some of the features employed in Disco are:
Some of the frequent kernel operations can be performed using load and store instructions on special addresses, thus reducing the trap emulation overheads.
Uses dynamic page migration and replication to maintain locality and limit cache misses.
Maintains a second level software TLB to reduce the overhead of TLB misses.
A virtual subnet to allow virtual machines to communicate while avoiding data replication whenever possible.
Sharing of read only pages between virtual machines enabled by the combination of copy on write disks and access to persistent data through the specialized network device.

Evaluation:
Since the FLASH machines were not available, they used SimOS simulator to evaluate Disco. The overhead of virtualization ranges from 3% to 16% and are mainly due to trap emulation of TLB reload misses. Memory overheads are limited. Disco provides scalability benefits by partitioning the problem into multiple virtual machines. Dynamic page migration and replication policies are shown to bring workload within 40% range of execution time on optimistic UMA machines. They also ported Disco on real hardware and showed an overall 8% slowdown.

Confusion:
I did not quite understand I/O devices virtualization in Disco implementation.

1. Summary
The paper presents Disco, a virtual machine monitor like software, that is place between the operating system and hardware so that operating systems do not have to change to work with new NUMA multiprocessor machines.

2. Problem
With the advent of large scale NUMA multiprocessors operating systems must adapt to work with these new machines to gain the most out of the hardware. However, operating systems are large and complex pieces of software that are not easily changeable and changes could also introduce reliability issues.

3. Contributions
Disco, which is incredibly light weight compared to a full scale operating system, is used to virtualize the hardware and acts as an intermediary to the OS. This means that Disco can change to fit the new hardware and the OS can continue as it was. The paper also claims that using Disco they can hide the NUMA side of memory accesses using page placement and dynamic page migration. An interesting byproduct of separating the OS from the hardware is that according to the paper you can run multiple heterogeneous operating systems according to the needs of the programs. I was disappointed that this contribution of the design was not further addressed as it seems to go along with some of the research going into heterogeneous core son the hardware side.

4. Evaluation
Unfortunately the hardware system Disco was targetting, FLASH, was not ready so to evaluate the device they simulated a scaled version of their targeted processor on SimOS. Using this simulation framework they ran several benchmarks that stressed parallelism and multiprocessors. They compare the results to the IRIX operating system without Disco and measure the total overhead of Disco by running the workloads on a uniprocessor. The paper also evaluated their memory system and claims that Disco can effectively hide the NUMA memory system. By doing this they are able to achieve 33% and 38% improvements in performance Finally they do port Disco onto an SGI Origin200 board to prove it works with actual hardware.

5. Confusion
Despite being referenced several times the page placement and dynamic page migration policies seemed a bit vague. The paper doesn't explain how they determine when and where to move a page to achieve such a great accomplishment in hiding the NUMA memory system. The only mention of any policy I believe is that it uses cache misses to determine when to migrate or replicate a page but it also mentions that Disco avoids moving a page too much for fear of overhead.

1. Summary
This paper introduces the idea of virtual machine monitor, a software that lies between operating system and hardware that can provide virtualized hardware to different commodity OS. This paper also includes a prototype called Disco that can manage multiprocessor CCNUMA hardware with the minimum modification to OS kernel code.

2. Problems
The problem this paper want to solve is the constraint between hardware and operating system. Hardware and OS are so closely connected, once the hardware makes some update, the corresponding operating system also needs to change a lot, like for the multi-processor hardware mentioned in this paper. But changing OS is very difficult due to the complexity of operating system, so software sometimes slows down the development of new hardware. Besides, previously there is only one OS that can run on a specific hardware, but sometimes we really need to run multiple specialized OS on the same machine. This paper also tries to solve this problem.

3. Contributions
This paper adds a new layer between OS and hardware, and it can virtualize all resources including CPU, memory, storage and network, and expose a new interface that is easy to manage for different OS. This new virtual machine monitor layer works like an operating system for traditional OS, and make it possible to schedule multiple OS to use one machine.
This paper also uses various optimizations to make VMM work fast. Some are related to hardware like the direct execution of unprivileged code, use load and store to reduce privilege instruction overhead, new TLB between VMM and hardware. Other optimizations include data movement among different NUMA nodes and data sharing among different OS, and they are transparent to OS level. All of these optimizations make Disco a system that can be used in real life.

4. Evaluation
This paper uses 4 workloads to evaluate the virtualization overhead of this system, overhead lies between 3% to 16%, which is not very large, but seems to be related to the different workload type. The breakdown of virtualization overhead shows TLB might be a crucial section in overhead, and using IRIX6.2 with 16KB page may help with this. It also evaluates the advanced page algorithms in this system, including page placement, dynamic page migration and replication, and this will lead to a performance improvement of 33% to 38% and can reduce the memory overhead. One possible problem is the new FLASH hardware is not available during the experiment, this paper continuously mention FLASH, but we never see the result on real hardware, but still, they use some simulations and another hardware to do experiments.

5. Confusion
For the direct execution, how can VMM efficiently distinguish whether an instruction is privileged instruction or not? Since VMM is only a software but instructions come at the speed of hardware.
This paper mentions using load and store to access some privilege registers? I don’t quite understand that. Does the system just ignores the real register? Or the system still maintain the consistency of real register and these memory?

1. Summary
In order to take advantage of NUMA shared memory architecture, the paper proposes a software layer that sits between hardware and OSs, called the virtual machine monitors. This enables multiple OSs to run on the same hardware providing scalability. The prototype implemented is called ‘Disco’.

2. Problem
With every new hardware innovation such as shared-memory multiprocessors, OS code had to be changed significantly which incurred significant development cost and also led to lot of bugs. Moreover, users desired to port their application programs with high reliability. There was, thus, a gap between hardware innovations and the pace at which system software gets adapted.

3. Contributions
To bridge the aforementioned gap, the paper revisits the old concept of Virtual Machine Monitors. By virtualizing CPU, memory and I/O, VMMs allow different OSs to run concurrently on the machine. Disco emulates execution of virtual CPUs by direct execution on physical CPU. It facilitates memory management using address translation between physical and virtual machine addresses. In order to make VMM NUMA aware, dynamic page migration and replication is provided. This keeps track of page usage across cores and reduces access latency by bringing pages closer to the executing core. The system is much more scalable than having a single OS.

4. Evaluation
Due to unavailability of FLASH architecture, Disco is evaluated on SimOS configured to resemble a multiprocessor. The performance and memory overhead is compared for applications running natively in OS and those running in OS on VMM. The benefits of Disco’s page migration and replication is also studied using Engineering and Raytrace -workloads that exhibit poor memory system behavior.

5. Confusion
I am not totally clear about the TLB design.
Is there any difference between VMM and modern day hypervisor ?

1. Summary

This paper presents Disco which is a modern implementation of the idea of virtual machine monitors of 1970s. Disco sits as a thin layer on top of the hardware and manages the virtualization by allowing multiple independent operating systems to run on top of the virtualized hardware resources. This allows the commodity operating systems to efficiently leverage the modern multiprocessor hardware without the need to redesign them.

2. Problem

The modern multiprocessor hardware (modern as in at the time of this paper i.e., ~1997) is equipped with tens or even hundreds of cores with shared memory. However, to make use of this hardware efficiently, the operating systems require significant changes including partitioning the system into scalable units, building a single system image across the units and fault containment etc. However, the size and complexity of current software systems makes this change infeasible and as a result the evolution of software is not able to keep up with the advancement in hardware. Disco is a hypervisor layer between the hardware and operating systems that helps evade this problem for scalable multiprocessors.

3. Contributions

The authors design and implement DISCO for a flash machine. Disco provides abstractions for processors, physical memory and I/O devices. For processors Disco provides the abstraction of a MIPS processor on top of a flash machine and emulates all the instructions, provides memory management unit and the trap architecture of the process which allows the unmodified applications to be run in the guest operating systems.

Secondly, Disco provides abstraction of uniform contiguous physical memory on top of the non uniform memory of the flash. This is important because most commodity operating systems are not designed to manage the non uniform memory of the flash. Disco uses dynamic page migration and replication to achieve that.

Finally, Disco virtualizes each I/O device for an operating system by intercepting all communication to and from I/O devices and emulates the operation. For storage, DISCO provides a set of virtual disks that any virtual machine can mount. Moreover, to let virtual machines communicate with each other and with the outside world, DISCO provides each virtual machine a MAC address and acts as a gateway for external communication.

4. Evaluation

DISCO is is developed and implemented in SimOS which is a simulator because the flash machine for which it was developed was not available yet. They use four representative workloads that include Pmake, Hardware Development simulation, scientific computing, and commercial database. They run all four of these workloads directly and on DISCO to measure the virtual machine overhead. This overhead ranges from 3% to 16% which is fairly reasonable. They also show the reduction in the kernel time of some workloads because some of the work of the operating system gets handled by the monitor. The authors also measure the memory overhead of DISCO for the same four workloads. In addition, the authors also port DISCO to actual hardware. Although, flash machine wasn't available they ported it to Origin 200 board that will for the basis of the flash machine.

5. Confusion

I did not quite understand the mapping of guest os's virtual pages to the actual machine pages.

1. Summary
This paper proposes Disco, an abstraction layer that sits on top of the hardware, allowing multiple existing operating systems to run in virtual machines with no modification. It does this by creating abstractions of processors, memory and I/O devices that can be used by commodity OSes running in virtual machines.

2. Problem
Large scale multiprocessors posed a significant challenge for software development in reaching the functionality and reliability provided for commodity machines. This resulted in a lag between innovative hardware and system software to run such machines. Two kinds of approaches were being taken to address these challenges. The first involved throwing a large OS development effort at the problem with systems like Hive and Hurricane, but it required large efforts and resulted in complex software. The second, used by Sun Enterprise 10000, was to statically partition the machine and run multiple OSes that communicate using distributed protocols, but suffers from rigidity and inefficient resource usage. Disco proposes inserting an interface called the VMM which virtualizes the resources to commodity operating systems that can be dynamically configured.

3. Contributions
1. The VMM was an important abstraction because it virtualizes all the machine resources, and allows the dynamic addition / removal of OSes on top. This approach allowed Disco to be lightweight as compared to Hive and Hurricane, and thus the risk of software bugs was significantly reduced. It also allowed dynamic configuration of VMs on top and thus better resource utilization than the Enterprise 10000.

2. By using dynamic page migration and replication, Disco made it possible to run commodity OSes on NUMA machines and thus provided great flexibility. Disco keeps track of “hot” pages, and moves them closer to the nodes where it is being used, which gives the appearance of uniform memory access to the virtual machine.

3. Copy-on-write disk mechanisms to allow sharing of pages between virtual machines and reduce memory usage. It also allows instantaneous disk read access times.

4. Evaluation
Disco is tested on a simulator SimOS and targets the FLASH scalable multiprocessor system. Four server workloads are run to assess the overheads of virtualization which were found to be from 3% to 16%. 8 VMs were run simultaneously in the Pmake workload and the results showed a significant reduction in memory footprint. Performance benefits of page migration and replication are also shown with satisfactory results. The one crucial thing missing is evaluation on real hardware, though the authors do mention that they have ported Disco to a board that will form the basis of the FLASH machine.

5. Confusion
1. The low level details related to the virtualization of memory were difficult to understand, and a revision in class would be helpful
2. How is the problem of resource management related to the detection of idle processes or idle loops in virtual machines solved by VMMs? How do modern hypervisors deal with this problem?

1) Summary

The difficulty of modifying OSes to support new hardware can be a bottleneck that slows down the developement and adoption of new technologies. The authors propose that a virtual machine monitor (VMM) can be interposed between hardware and the OS to give the OS the appearance of traditional, supported hardware. In particular, they develope a VMM for the FLASH Numa architecture.

2) Problem

Kernel programming is tricky. When new hardware becomes available, it has slow adoption because common OSes are slow to add support, and this support is often somewhat low-quality at first.

This presents a hinderance to the developement of new technologies. Hardware companies must convince OS developers to add and maintain support for their devices. This represents added cost and time to developement.

The authors seek a way to use commodity OSes as-is on new hardware.

3) Contributions

The authors describe the implementation of Disco, a VMM that abstracts away the hardware details of the Numa FLASH architecture to present a Uma MIPS R10K interface to OSes running in virtual machines. Their system abstracts away the non-uniformity of the memory system, the multiple processors, and devices (e.g. storage and network). It also provides a way for VMs on the same host to communicate via standard distributed/networking protocols by placing all VMs on a virtual network.

However, another major contribution of this paper is the identification of several difficulties in the design and implementation of VMs and hypervisors. In particular, devices are particularly hard to virtualize since they tend to require the use of asynchronous interfaces with interrupts, DMA, and priveleged instructions. Another challenge is the virtualization of priveleged code since virtualizing every single instruction in guest kernels would result in infeasible performance. Yet another challenge is the fact that the VMM has no high-level semantic information to distinguish the importance of behavior in guest OSes. For example, the VMM cannot differentiate the idle process from a computational loop.

Finally, the authors produce a working prototype of Disco on similar hardware to their target system as a proof-of-concept.

4) Evaluation

The authors present a convincing motivation of the problem. In fact, they demonstrate the problem using an in-house Numa architecture, FLASH. They explain many of the difficulties involved in porting Unix or Windows to FLASH. This serves as an explaination for why they believe it is acceptable to introduce an extra layer of indirection, despite the overhead.

However, unfortunately, the authors resort to using hacks and inelegant tricks and modifications to guest OSes in order to virtualize devices. Moreover, to get reasonable performance, they resort to hacks once again, such as virtualizing priveleged registers and replacing instructions that manipulate them in guest OSes. These hacks seem to be a major weakness in their claim that VMMs can help port OSes to new platforms as-is.

Also, while the authors explain the design of their system and their reasoning for key design decisions, their methodology seems to be lacking, in my opinion. First, throughout the paper, the authors use the MIPS R10K ISA as the outward-facing interface exported by the VMM. But the decide to simulate a much simpler, uniscalar processor at twice the frequency with very little justification for this oversimplification. Moreover, they do not specify the hardware features of this oversimplified processor; they just use it. Second, their simulations run for only 3-13 seconds. While I understand the time constraints that lead to this, I do not feel that the authors sufficiently explained why these short samples are representative. 3-13 seconds seems too short to profile such a complex system (with multiple cores, a VMM, and guest OSes). Thus, the evaluation of Disco seems insufficient.

Finally, the authors claim that Disco itself would be easier to port to new hardware, but they do not demonstrate this in their design. For example, what percentage of their code is platform-dependant? Does it all need to be re-written for every platform? While their design is still useful since it allows common OSes to run on FLASH, would Disco be sufficiently portable to be useful in running common OSes on other new hardware? These questions are not addressed in detail.

5) Confusion

Why did the authors choose MIPS R10K? They admit that it is an unfortunate choice when virtualizing memory because of its kernel segments. They also subsequently choose not to simulate an R10K in their methodology...

1. summary
This paper solves the problem of running OS efficiently on scalable multiprocessors by simply inserting a layer between the OS and the hardware. The layer is called virtual machine monitors.

2. Problem
- Scalable computers have tens or hundreds of processors, but the system software does not efficiently make use of such kind of machines as it trails hardware for the functionality and reliablity.
- The operating systems usually contains millions of lines of code. It is of significant cost to tune the OS for the scalable computers. Also new operating systems which are incompatible and buggy even pose a negative impact.

3. Contributions
- The biggest contribution of this paper is the virtual machine idea it teaches people. Usually people would like to extensively modify the operating systems to fit the innovate hardware, but this paper insert a layer between the OS and the hardware without many efforts and solve the problem.
- The authors also design and implement a prototype, Disco, and solves the challenges facing virtual machines. For the time and memory overheads, the authors do comprehensive evaluations and find these overheads do not play a big role. For the communication and memory sharing problem, the author use DMA and copy-on-write to allow sharing among separated virtual machines.

4. Evaluation
The authors do experiment on SimOS, a machine simulator and configure it similar to FLASH. Due to the slow simulation models, the authors model statically scheduled, nonsuperscalar processors running at twice the clock rate. The authors evaluate the system under the following four server workloads, software development, hardware development, scientific computing, commercial database with four metrics.
- execution overheads: For all these workloads, the execution overheads is no more than 16% of the execution time. The authors also study overhead breakdown for the top kernel services (Pmake).
- memory overheads: These overheads are also overall small (less than 20MB in 8VM 256MB configuration).
- scalability: two workloads - Pmake, parallel scientific application are studied. IRIX without disco suffers from high synchronmization and memory system overheads, while with disco the execution time is reduced to 60% due to the reduction in the kernel stall and synchronization time.
- benifits of Disco's page migration and replication: The authors study two cases, Engineering and Raytrace, where IRIX does not perform well. Disco has a 33% performance gain for Engieering workload and 38% improvement for Raytrace.
5. Confusion
Do not quite understand the TLB design part (Virtual physical memory). What does MIPs bypassing the TLB mean? Why does it make impossible for Disco to efficiently remap these addresses? How does the design in the paper solve the problem?

1. Summary
Disco is a prototype Virtual Machine Monitor (VMM) designed to run conventional operating systems on scalable multiprocessor machines.

2. Problem
The core count in new multiprocessors is rapidly growing. Conventional OSs are not suited to manage the large amount of resources provided by these machines. Building an OS from the ground up for a manycore machine is a huge development effort, and will likely take quite some time to complete. The scalable hardware is available now: there is a need for systems software to support it.

3. Contributions
Disco offers a “middle ground” between a manycore OS built from scratch and a strictly physically partitioned machine. I think the idea of Disco as a viable stopgap solution to systems software on a multicore is valuable. Most of the ideas concerning VMMs and their implementation are taken from previous work.
Running a “thin” OS purposed for a single application on top of the VMM is a cool idea.
Their transparent system for sharing pages between VMs works pretty well and is a good way to conserve memory in a system like this.

4. Evaluation
Disco was created for the FLASH machine which, ironically, was not available at the time of publication. Instead, they use a simulation of a bunch of R10Ks configured to look like the FLASH machine. They run a Unix-based OS that has been modified to run as a VM, which is what would happen on the real hardware. They also run a “thin” OS directly on top of the VMM to run some Splash-2 benchmarks.
With this setup, they measure the performance and memory overheads caused by the VMM, comparing apps running natively in the OS (on the simulator) and running in the OS on top of the VMM. One of the most impressive results is Figure 7, showing how much memory footprint is saved by sharing pages between VMs.
They also were able to run Disco on real hardware and saw about a 10% performance overhead, which is pretty impressive for an unoptimized prototype.

5. Confusion
I understand that the NUMA migration/copying happens automatically, but how do pages shared between VMs get identified? It seems like that would require some sort of indication by the user. Or is it through the FS interface?

Post a comment