« The multikernel: a new OS architecture for scalable multicore systems | Main | Memory Resource Management in VMware ESX Server »

Disco: running commodity operating systems on scalable multiprocessors

E. Bugnion, S. Devine, K. Govil, and M. Rosenblum. Disco: running commodity operating systems on scalable multiprocessors. ACM Trans. Comput. Syst., 15(4):412-447, 1997.

Reviews due Tuesday, February 2 at 9 am.

Comments

Summary
Through the use of specialized "balloon" drivers, VMWare ESX can offload intra-vm paging decisons to the guest OS's policies. By combining this with hash-based page sharing policy and a set of share-based statistical allocation policies, ESX efficiently virtualizes unmodified systems on overcommitted hardware.

Problem
Commercial virtualization poses a variety of challenges; server hardware is expensive, applications may require difficult or impossible to modify commodity operating systems, and server operators have to maintain quality of service obligations to customers. To ensure maximum usage of hardware, operators may overcommit resources, running more guests than the hardware is actually capable of running. However, naive resource management strategies may run afoul of customer needs, by degrading the performance of a VM subject to a QoS guarantee. Moreover, the clever sharing policies and other optimizations provided by e.g. Disco, are not compatible with off-the-shelf software. Additionally, hypervisor policies that are ignorant of the internal state of the guest OS may interact pathologically with the guest system, needlessly degrading performance.

Contributions
ESX uses a specialized guest OS "balloon driver" which the hypervisor can instruct to "wire down" physical pages, thereby excluding them from consideration by the OS's paging algorithms. By inducing intra-vm memory scarcity, the rest of the policy decision is offloaded to the guest OS's policy. This is a clever application of the broader application of "Grey Box" techniques to virtualization; the hypervisor can "request" behavior from an unmodified guest OS by externally creating a VM state that induces the behavior.

Instead of adding hooks to memory manipulation syscalls (as Disco did), ESX efficiently shares "data-identical" pages via a more general hash-table based data structure populated by periodic random scans of pages. To enforice the prioritization of the performance of VMs, ESX uses a share-based policy which assigns weights to different VMs, allowing machines with memory usage over-extended relative to the share of the hardware to have their pages evicted first, and for idle pages to be reclaimed via a "tax". In the case of page sharing and the memory tax, VMWare introduced a randomized background sampling - a small number of random pages are tested periodically for the desired property. By doing this, expensive tasks, such as checking for idle pages by forcing expensive TLB misses, can be amortized to a smaller and uniform background cost.

Measurement
The metrics used to measure ESX's performance show that these optimizations increase efficiency. The performance of the balloon mechanism, the first test of page sharing and, idle memory reclamation are all measured with batch-style synthetic benchmarks. While the synthetic page sharing results are striking (60% shared across VMs and a slight performance increase), they represent the best-case situation of identical systems running the same task. Similarly, the idle reclamation test demonstrates that policy and mechanism work, but the quantitative results correspond to the fairly simple cases, like one idle OS and a second running a batch job. In general, the performance metrics are over-simplistic - some provide comprehensive numbers for memory usage, some for performance, but none discuss both on the same workload. Moreover, with the exception of the idle memory tax and the dynamic reallocation, it's hard to get a sense of the characteristics of the workloads or how representative they are. Moreover, as with Disco, there is no meaningful discussion of interactive performance and latency.

Confusion
I'm curious about the performance anomalies that occur when VM policies interact with the guest. Besides "double-paging" are there other intersting classes of degenerate behavior?

1. Summary
The paper tackles the problem of extending modern OS to run on large shared-memory multiprocessor systems efficiently but without high implementation effort. The authors proposes Disco, which runs multiple VMs, each running commodity OS, controlled by a Virtual Machine Monitor (VMM) to manage sharing of processor and memory resources. The paper highlights the advantages of this implementation and tackles issues with VMMs in general and demonstrates their approach atop a prototype system targeting the Stanford FLASH and running the IRIX OS. Results show that while virtualization overheads can be significant in some cases, scaling to large systems offsets overheads and further provides improved performance with lower memory footprint.

2. Problem
Hardware is constantly scaling - multiprocessor systems with increasing number of nodes are becoming more popular. Managing a monolithic OS for larger shared memory multiprocessing system presents complexities - requiring significant changes to commodity OS, code running into millions of lines, complex changes to support fault containment and CC-NUMA, efficient locking, all this often resulting in software trailing hardware development. The ideal solution should be easily scalable but with less complexity.

3. Contributions
The authors propose Disco, which decouples the scalability factor for OS from the commodity OS itself - by implementing virtual machines managed by a VMM. The virtual machines themselves run standard commodity OS (with minimal changes at the HLA to aid the VMM) while the VMM is tuned/optimized to support large-scale sharing. The VMM uses global policies to manage resources across the machine and provides flexibility to run multiple OS, even specialized ones on the VMs.

Disco emulates virtual CPUs by direct execution of instructions (from the VMs) on the real hardware. When traps such as page fault occur, the processor traps to the monitor that emulates the OS trap effect. To virtualize physical memory, Disco adds an extra address translation layer from physical to machine address. Disco intercepts TLB accesses by OS to replace the VA - PA mapping with a VA - MA mapping. Disco implements a s/w second level TLB to compensate the increased TLB misses due to VM-switching flushes and kernel mappings in the TLB.

NUMA memory management issues are hidden from unaware OS by making the VMM NUMA aware, a less complex task, and thereby utilizing careful page movement. Disco eases memory management as well - pages heavily accessed by one node are migrated to that node while read-shared pages are replicated.

Disco also efficiently manages Copy-On-Write Disks, I/O device access and the virtual network interface.

4. Evaluation
Disco is run atop the SimOS which models the hardware of a MIPS R10k processor. The paper discusses the evaluation of 4 benchmarks - interestingly only one of the four is a non-compute-bound benchmark (Pmake).

The authors evaluate the execution overheads, which are sometimes large due to overheads from high rate of TLB misses, cost of trap emulation etc. Kernel time itself is reduced due to increased management from Disco and the presence of a second level TLB. Memory overheads of running 1-8 VMs on 8 processors is analyzed - it is evident that as number of VMs increase, the memory footprint is significantly reduced by Disco thanks to well designed memory sharing. The authors all show the effect of running multiple VMs, each running software threads - implementing efficient locking mechanisms in the VMM, significantly reduces overall synchronization costs, making multi-VM systems faster.

5. Confusion
The paper implicitly assumes the drastic reduction in complexity when moving optimizations for scalability from the OS to the VMM. While this may be true, if the increased complexity in a monolithic OS can still provide a very small memory footprint and can provide high performance (through complex locking mechanisms), the complexity might be worth it?

The paper is motivated by hardware agnostic OS - the OS should run on VMs without worrying about scalability or the outside world. This is not really the case with Disco, since the OS is modified considerably to provide hints to the VMM and with special device drivers, going against the agnosticism policy.

As discussed briefly in the paper, the lack of (or less) awareness of the VMM to actual process needs in terms of compute and memory can be detrimental to the overall performance. The hints provided by the OS to the VMM will probably be be less efficient than the complete information known to the OS?

1. summary
This paper seeks solution to improving scalability of current system software on large-scale shared-memory multiprocessors with minimal development cost. Their solution is a virtual machine monitor, for which they provided a specific prototype called Disco, that handles hardware resources and provides virtualization for commodity operating system copies that run on multiple virtual machines.
2. Problem
The problem that is of major concern is the gap between hardware innovation and corresponding adaptation of system hardware. New hardware products features not only increasingly massive number of processors, which requires support of scalable system software, but includes designs like cc-NUMA that need extensive modification of current operating system to fully utilize them. The result of the high cost of OS adaptation results in significant delay of system software over hardware, which discourages its innovation.
3. Contributions
The solution they propose is a virtual machine monitor, whose prototype named Disco, that virtualizes hardware resources for the existing commodity operating systems, providing them a more conventional hardware interface. They designed the hypervisor such that it avoids a lot of the problem that traditionally challenged the old idea of hardware monitor.
i. Overhead in execution time and memory footprint: The overhead in execution time occurs when operating system performs privileged instructions, and can be significant in routines such as synchronizing that heavily use those instructions. Disco avoids this by converting such instructions to loading and storing instructions on special addresses to avoid frequent traps. The memory footprint can be large when large low-level data structures like files are duplicated in each virtual machine. Disco performs profiling for data usage like counting cache-misses to dynamically decide whether to migrate or replicate pages to balance copy overhead and data locality that is significant in NUMA memory model.
ii. Resource Management: The difficulty of managing and scheduling resource efficiently for hypervisor comes from the lack of information of usage from OS. Disco made small modification to the source code of OS so that they send hints of their resource usage.
iii. Communication and Sharing: Old VMM model had various limitation of communication and sharing resources across virtual machines. In Disco, virtual machines communicate using standard distributed protocols like TCP/IP and NFS. Monitor provide subnet for each virtual machine. Each virtual machine is provided abstraction of main memory and I/O device which they cana claim exclusive use. Special abstraction allows optimizations in sharing resources and techniques like copy-on-write on both memory and disk resources.
4. Evaluation
Due to the unavailability of the FLASH machine on which Disco is targeting, they made experiments on SimOS that emulates FLASH. They tested on four typical use cases, which showed overhead from 3% to 16%. However, scalability was observed to greatly improve. As the number of virtual machine rises performance improvement soon compensates for the overhead and outweighs the compared system. Later, they also tested the system on real hardware to improve confidence.
5. Confusion
Exactly how does loading and storing instructions in special addresses avoid the trapping overhead? Does VMM set memory mapping from those addresses to the privileged registers? But is that feature portable in all mainstream hardware products?

1. Summary
This paper re-visits the old idea of virtual machine monitors, in the new context of modern large-scale shared memory multiprocessors. VMMs are an additional layer of software between the hardware and the OS, and multiple copies of commodity OSes can be run on a single multiprocessor system. Disco, an implementation of this idea on a CC NUMA machine, is also discussed.

2. Problem
System software for marketplace scalable multiprocessor hardware systems has often trailed hardware in functionality, flexibility and reliability, as extensive changes to the complex Operating systems are required. Hence, instead of modifying existing operating systems, the idea here is to add a virtual machine monitor layer that virtualizes all the resources of the machine, exposing a more conventional interface to the OS. Moreover, multiple OSes can run in parallel on top of the VMM.

3. Contributions
• Primary Contribution - The idea of applying the concept of VMMs to solve the challenges facing system software for scalable multiprocessor systems.
• Fine grained resource sharing potential of the hardware is tapped.
• Abstracts the NUMA-ness of the underlying system, provides UMA interface to virtual machines.
• Specialized and commodity OSes can co-exist.
• Re-use of well-known virtualization ideas to deal with the new layer, such as machine addresses, and vCPUs being scheduled on physical CPUs analogous to how processes are scheduled on traditional systems.
• Clever implementation in Disco viz dynamic page relocation and migration, code segment replication, second level of TLB, direct execution, etc.

4. Evaluation
Issues
• Execution overheads due to complicated exception processing, I/O handling.
• Memory overheads due to replication of each OS, file system buffer cache, multiple file systems, etc.
• The Monitor must make resource management decisions without the high-level knowledge that an OS would have.
• Sharing and communication is difficult.
Measurement
• Virtualization overhead for uniprocessor workloads - 3% to 16%
• Some workloads can be run 1.7x faster on a 8 VM system than a commercial symmetric multiprocessor system.
• NUMA-ness can be completely hidden, reducing execution time by upto 37%.

5. Confusion
At runtime - can the number of virtual CPUs assigned to a VM increase? Is it configured at OS boot time?

Summary

To extend modern OS to run efficiently on shared memory multiprocessors without large changes to OS a layer is added in between the hardware and the operating system. It is built to run multiple copies of "commodity" operating systems a single multiprocessor.

Problem

Hardware industry is growing at a steady pace with new variants and large scale multi-scale processors coming out everyday. Customizing the operating system for these can be time-consuming and buggy. Commodity Operating Systems are not suited for such large scale processors , especially the memory architecture. DISCO provides a solution to this problem. Existing IRIX operating systems can run on Flash Shared Memory Multiprocessor via tDISCO.


Contribution

Scalability: DISCO provides scalability of the OS to large scale multiprocessors. Partitioning of problems into different VM's provides scalability.
Flexibility : Flexibility to run different virtual machines ("different operating systems") on single hardware.
Overcome the NUMA-ness : DISCO implements Dynamic Page Migration and Replication so that pages frequently accessed by one node are migrated and read shared pages are replicated among all nodes. Hence, the Operating System does not sense the drawbacks the NUMA of the hardware. It gets hidden by DISCO's implemenation and memory management policies.
Fault Containment : Such a structure of running multiple operating systems independently on a multiprocessot, vs one complex Operating System managing the entire hardware also help in Fault Containment.
Modifications to the IRIX : added code to the HAL to pass hints to monitor for effective resource management by DISCO. New monitor calls to MMU to request for unused memory reclamation.


Evaluation

The performance and memory overheads were evaluated using SimOS a machine simulator that models the MIPS-based multiprocesssors. On an average virtualization overhead on performance was seen to 3-16%. Performance overhead of 16% was seen in the pmake workload as it uses the OS services heavily for file system and process creation. For compute bound operations the overhead was mainly due to DISCO trap emulation of TLB reload misses. Overall, in my opinion the paper conveys that benefits exceed the overheads due to virtualization.

Confusion :

1) Can we discuss more implementation details of DISCO. I am not able to visualize the implementation details of the DISCO running multiple Virtual Machines. The concept is clear though.

2) How exactly does keeping the DISCO code in all memories of FLASH machine help satisfy "instruction cache misses" from the local node?


1. Summary
Disco is a virtual machine monitor which allows a wide variety of operating systems, from off-the-shelf commodity systems to specialized multicore OSes, to run on scalable multicore ccNUMA hardware with minimal modifications to guest OS code. Disco abstracts away the nonuniform nature of the hardware's memory model and presents the hardware as a networked collection of UMA machines, each possessing one or more CPUs.

2. Problem
Adapting commodity OS software to new hardware is a difficult and slow task; modern operating systems are millions of lines long, and timely software releases coordinated with that of the hardware require buy-in from both hardware and software developers. Outside of market standard architectures, such coordination may not be present, and OS releases can lag behind by months or longer. Moreover, code changes to support new architectures are wide-ranging, and bugs may linger for a long period after release. Classical virtual machine monitors present a possible solution, but may be unacceptable due to large performance overheads and inability to intelligently gauge guest behavior.

3. Contributions
Via some comparatively simple modifications to guest OS code, Disco can use the memory management features of the FLASH architecture to facilitate several novel optimizations. Disco runs the guest OS directly on the CPU(s) in an intermediate 'supervisor' mode, allowing access to kernel virtual memory, while trapping into the VM monitor if the guest attempts to access physical memory or use instructions reserved for a higher privelege level. Disco is able to interpose on guest OS TLB manipulations, and can thus identify "hot" pages, and transparently migrate them to the the accessing core's physical memory, or duplicate the page in read-only mode across multiple cores. Moreover, Disco uses its memory manipulation facilities to efficiently multiplex DMA I/O devices, letting multiple VMs share the same buffer pages, and mapping inter-virtual-machine network traffic directly into the recipient's address space. Thus, Disco's guest OSes can interact as though they were a network of disjoint machines, while exploiting the performance benefits of shared hardware.

4. Evaluation
The authors evaluate Disco in a hardware simulator across a variety of workloads, and record a roughly 16% execution overhead, much of which can be attributed to certain commonly used OS code paths crossing the guest/hypervisor boundary multiple times. Moreover, due to copy-on-write page sharing, memory usage among parallelizable workloads grows much more slowly than if all redundant data were shared across cores. The authors also show that parallellizable workloads experience an improvement in runtime as the number of cores increase, in spite of hypervisor overheads. While these results are promising, every workload tested is a non-interactive job. It would be interesting to see how Disco affects latency in interactive jobs such as hosting workstation sessions.

5. Confusion
I don't really understand what the authors meant when they mentioned that copy-on-write was restricted to non-persistent disks. Outside of the backing store used to record evicted memory pages, I cannot think of another use case that leads to a ephemeral disk that would be shared between virtual machines.

1. Summary

This paper discusses DISCO, a virtual machine monitor. DISCO allows multiple "unmodified" commodity operating systems to be run on the hardware. Virtual machine monitors are not new but have had very bad overheads and problems with respect to sharing and resource management. Disco address this problem with some clever set of techniques implemented in the VMM to efficiently virtualize the hardware to run multiple operating systems.

2. Problem

CC-NUMA, a new architecture needs operating system software restructuring or even may be rethinking. It might be hard to write commodity operating systems from scratch. DISCO proposes the old idea of using VMM to virtualize the CC-NUMA hardware but it implements a clever set of techniques in the VMM to overcome some of the known problems of VMMs. VMMs suffer from execution overheads, memory virtualization data structure overheads, resource management (how to detect idle CPU when a OS idles?), ineffective sharing of similar pages.

3. Contributions

DISCO uses a nice set of techniques in the VMM to achieve better virtualization. Some of them are:

Transparent page sharing: DISCO looks at potential points to transparently share memory pages. For example, it interposes on DMA on virtual IO devices. When a device is first opened, nothing special happens. Disco just records the information in a data structure. On subsequent opens of the same device, DISCO can transparently share the pages between 2 VMs. This is a very good way to share OS code pages and popular application binaries like gcc etc. It also interposes on network sends and buffer copies to effectively to share a single page between NFS server and NFS client running on the same machine. This was quite impressive.

NUMA replication and migration: Disco transparently replicates read heavy pages and migrates pages that are heavily accessed from one processor. The paper says that pages are shared whenever possible and replicated for performance reason when needed.

Address translation: Since TLB misses are going to be much inefficient in a VMM (because need to trap back and forth between OS and VMM) and there will be more TLB misses (because IRIX is mapped and TLB is shotdown on vcpu switch), DISCO cleverly caches recent mappings in a second level software TLB. This effectively cuts lot of traps and returns which I felt was really neat.

4. Evaluations

Their evaluations show that page sharing is effective and works well as VMs increase. Execution overheads are shown in their experiments. I think they should have compared with some other VMM (but none existed?). The overheads seem to be in the acceptable range but there was no baseline to compare then. Replication and migration are seem to show performance improvements for few workloads.

5. Confusion

They started like they wanted to run unmodified guest operating systems but ended up modifying a bunch of code in IRIX OS. I understand that these modifications are necessary for effective virtualization. But how do current virtual machine monitors deal with the situation? Do they still require modifications?

MIPS is software managed TLB - How does address translation work for hardware managed TLB? Is it more complex?

Summary
-----------
The authors advocate for a system software that should be highly scalable for shared memory multi processors. They propose solutions to weaknesses surfaced in prior versions of virtual machines(sharing, I/O communication, resource management, execution overhead and memory overhead). They propose a Virtual Machine Monitor(VMM), a layer lying in between hardware and OS, that virtualizes hardware and enables multiple commodity OS to run in parallel. The prototype Disco has minimal implementation changes and is evaluated to show that it provides comparable performance to conventional OS.

Problem
----------
Support for continuously evolving hardware required significant changes in the large code base of system software, that involves large implementation costs, possibility of introducing bugs, compatibility issues with past and future versions. To overcome the above problems the authors implement a prototype based on the idea of VMMs-a layer between hardware and OS, that multiplexes hardware resources for multiple commodity Operating Systems to coexist on shared memory multiprocessors.

Contributions
----------------
-Disco has a VMM-a new layer of abstraction between hardware and OS that virtualizes the hardware resources, physical memory and scheduling.
-Virtual CPUs abstract the processor using three modes:kernel, supervised(guess OS runs in this layer) and user mode(user applications run in this layer).
-VM data structures in each VM manage the registers and TLBs containing information about VCPU state.
-VMM translate physical address passed by guest OS to real machine address in a software TLB, and a TLB flush is required when VCPU migration occurs
-NUMA abstracted as UMA, providing non NUMA OS to run on NUMA hardware.
-Failure of on VM does not crash other VMs giving rise to efficient fault containment.
-Disks are based on Copy-on-write mechanism.
-VMs share memory and communicate using standard distributed NFS protocols.
-Privileged instructions are stored in HAL to reduce execution overhead.

Evaluation
------------
-Was carried out by deploying Discon on SimOS for various memory and I/O intensive workloads belonging to scientific, software development and database domains.
-Virtualization overheads varied from 3% to 16%.
-This overhead can be attributed to the trapping into VMM during a TLB miss owing to the common path to enter and exit kernel at a page fault, system call or interrupt.
-Increased page size can reduce the above overhead.
-Hence page replacement, dynamic page migration and replication provided a 33%-38% increase in performance compared to commodity OS.

Confusion
------------
1. Concepts of Copy-on-Write, persistent and non-persistent disks.

Summary:
This paper describes how virtual machine monitors can be used to construct system software for innovative hardware like scalable multiprocessors with minimal changes to the OS. The authors also give details regarding their prototype Disco that runs multiple commodity OSes on a multiprocessor. The results reported seem promising as they demonstrate that the monitor overhead is small, the system is scalable and has the ability to deal with NUMA memory design.

Problem:
The authors claim that system software for innovative hardware is developed much later that the hardware as the system software is very complex and inflexible. As a result of this inherent complexity it becomes difficult to make changes to the software to support new paradigms such as non-uniform memory accesses. The high development costs involved in building OS for scalable hardware makes it hard for the acceptance for innovative hardware. The authors try to solve this gap that exists between hardware and system software.

Contributions:
The basis of the solution is to use virtual machine monitor(VMM) in-order to provide system software for scalable multiprocessor. The VMM acts like an interface between the hardware and the OS, exporting a conventional view of the hardware interfaces via resource virtualization. Disco supports direction execution for most operations, except for privileged operations such as TLB modifications, accessing privileged registers etc. Instead, the privileged instructions are emulated by Disco via the use of privileged registers foe each virtual CPU. Disco virtualizes physical memory via the addition of an extra level of address translation between the VM’s physical memory and machine memory. There is a increase in the number of TLB misses as it is used for all the OSes hosted on top of the VMM. Inorder to solve this problem a 2nd level SW TLB is introduced in Disco. Inorder to make the VMM NUMA-aware, Disco uses page migration and replication to hide the NUMA nature of underlying HW. This enables the running of non-NUMA aware OS on top of NUMA systems. Disco virtualizes I/O devices by intercepting access to I/O devices and forwarding them to physical devices. VMs communicate with each other by using standard distributed protocols such as NFS.

Evaluation:
The authors evaluate the performance of this approach by using Disco on SimOS, a machine simulator. 4 different workloads representative of the use of scalable multiprocessors were used to evaluate the performance of Disco. Virtualization overheads were obtained by running the various workloads on a uniprocessor with and without the presence of Disco. The results obtained indicated an overhead of 3%-16% mainly due to the Disco trap emulation of TLB misses. Evaluation of the memory overheads brings out the fact that data sharing in Disco between VMs reduces the footprint by almost half in case of 8 VMs. The results obtained on evaluating the scalability and mechanisms that make Disco NUMA-aware seem validate the claims made by the authors. Eventhough the results seem promising, one must not ignore the fact that long running workloads could not be used for the evaluation.

Confusion:
I did not understand the use of copy-on-write disks with respect to Disco. A discussion regarding the same would be helpful.

1. Summary
This paper present an approach on developing system software for scalable shared-emory multiprocessors without a extensive changes in the operating system. This is done by introducing a layer of virtual machine monitors between OS and hardware so that multiple commodity/specialized OSs can run. Disco empirically established that despite the virtualization overheads, its performance is comparable to that of native OS.

2. Problem
The main problem the authors are trying to solve is the implementation of large-scale shared-memory multiprocessors without a lot of changes in the present operating system codebase to avoid instabilities, inflexibilities and incompatibilities. To address fault containment, cache-coherent NUMA management and scalability, significant OS changes such as partitioning the system into scalable units, building a single system image across the units were required on the OS. This was to avoid the gap between hardware and software innovation cycle.

3. Contributions
The idea is to abstract hardware from operating system by using Virtual machine monitor that manages all the resources such as disk, main memory so that multiple virtual machines can coexist on the same multiprocessor. Each virtual machine has processor and memory resource that OS can effectively handle. The monitor can dynamically schedule virtual process on physical processor for load balancing. It also gives access to more memory between virtual machines to keep applications from paging to disk when free memory is available in the machine. It makes sue the system is not NUMA-aware by introducing page migration and replication. There are some disadvantages like overheads in memory, I/O being intercepted and remapped, monitor can not identify when the page is no longer actively used by virtual machine so it can't reallocate and, its difficult to distinguish between idle loop and lock busy waiting so resource might be utilized by lower priority task.


4. Evaluations
The virtualization overheads are empirically evaluated and explained via four workloads execution. The results suggest that virtualization adds between 3-16% of overhead for uniprocessor workloads, wherein the worst case is during monitor initializing pages on behalf of kernel suffers from memory stall and instruction execution cycle. Page migration and replication techniques seem to effectively hide the non-uniform memory access architecture across all the clusters and provide 33-38% performance over commodity OS running on native HW.


5. Questions
The usage of the data structures of Disco that include the memmap and pmap during the TLB miss handler seems tricky. My understanding is that pmap maps the physical address of a virtual machine to the machine address while maintaining a bitmask of vcpus that point to the machine address. Memmap of a machine address contains the list of pmap that refer to the page. Can we discuss their design motivation.

summary ~
This paper examines the difficulties in extending OSes to run efficiently on large scale shared-memory multiprocessor. In the paper the authors demonstrate the idea of insert a software layer (virtual machine monitors) between OSes and hardware to make the OSes scalable and reduce the implementation effort.

problem~
Innovations in hardwares brings more and more processors to the system, but it requires excessive amount of modifications to current OSes in order to cope with this trend of hardware innovation and the modifications on the current OSes also introduced compatibility issues. The popular idea of virtual machine monitors as a layer between OS and the actual hardware can be bring back to solve this problem, but the original idea of virtual machine monitors also has some issues like performance overhead, resources management and communication and sharing.

contribution~
The authors bring back the idea of virtual machine monitors, and identified the challenges facing by them and propose solutions to them. To cope with execution overhead the system utilized direct execution and extended the architecture to support efficient access to some processor privilege functions. Lack of information about guest OS’s execution and memory allocation prevent the monitors makes a better policy decision. To solve this problem, the authors proposed that some modifications can be done on the commodity OS’s HAL so that they can forward information about their execution and memory allocation to hint the monitor to make better decisions. Communication and sharing has also been improved by paging and introducing of distributed protocol.

evaluation~
The authors chose a wide range of workloads including I/O intensive, OS intensive and larger memory footprint workloads to verify their assumptions about execution and memory overheads, memory footprint, scalability and implementation effort. The results verified their assumptions.

confusion~
how is the efficient access to some of the processor function done with load and store instructions, and how is this improves the efficacy?

summary ~
This paper examines the difficulties in extending OSes to run efficiently on large scale shared-memory multiprocessor. In the paper the authors demonstrate the idea of insert a software layer (virtual machine monitors) between OSes and hardware to make the OSes scalable and reduce the implementation effort.

problem~
Innovations in hardwares brings more and more processors to the system, but it requires excessive amount of modifications to current OSes in order to cope with this trend of hardware innovation and the modifications on the current OSes also introduced compatibility issues. The popular idea of virtual machine monitors as a layer between OS and the actual hardware can be bring back to solve this problem, but the original idea of virtual machine monitors also has some issues like performance overhead, resources management and communication and sharing.

contribution
The authors bring back the idea of virtual machine monitors, and identified the challenges facing by them and propose solutions to them. To cope with execution overhead the system utilized direct execution and extended the architecture to support efficient access to some processor privilege functions. Lack of information about guest OS’s execution and memory allocation prevent the monitors makes a better policy decision. To solve this problem, the authors proposed that some modifications can be done on the commodity OS’s HAL so that they can forward information about their execution and memory allocation to hint the monitor to make better decisions. Communication and sharing has also been improved by paging and introducing of distributed protocol.

evaluation
The authors chose a wide range of workloads including I/O intensive, OS intensive and larger memory footprint workloads to verify their assumptions about execution and memory overheads, memory footprint, scalability and implementation effort. The results verified their assumptions.

confusion
how is the efficient access to some of the processor function done with load and store instructions, and how is this improves the efficacy?

1. Summary
The paper attempts to tackle the issue of operating systems not being able to keep up with large scale shared memory multiprocessors. The authors introduce an abstraction layer in the form of a monitor that virtualizes the underlying hardware to the guest operating systems. This has additional advantages of dealing with non uniform memory access as well as fast communication through sharing of data. A prototype implementation was built and tested.
2. Problem
The problem revolved around the complexity of modifying existing operating systems (due to their size and complexity) for modern hardware. These include fault containment, scalability as well as CC-NUMA management. This leads to Operating systems lagging in both reliability and functionality unless massive developmental efforts are put in. Previous solutions such as Hive and Hurricane tried to redesign the operating system from scratch for multiprocessor architectures, this solution while sound was not practically viable due to applications relying on legacy operating system behaviours and the vast efforts involved.
3. Contribution
The paper introduces an additional layer of software between the hardware and the operating systems known as the virtual machine monitor. This allows various commodity as well lightweight specialized operating systems to coexist on the multiprocessor system. The monitor virtualizes processors by allowing direct limited execution whenever possible, except for privileged instructions which are handled by the monitor. Physical memory access is sped up by having virtual to physical address mappings in the TLB, as well as constructs such as a software TLB. The monitor also implements dynamic page replication and migration strategies to provide Uniform Memory Access on the underlying NUMA hardware.The virtual machines can behave as a distributed system while transparently sharing memory regions and buffer caches to greatly improve the rate of data sharing. I/O management is performed by allowing the guest operating systems to access the I/O devices through specialized device drivers that communicate with the monitor. This is much more efficient than emulating all I/O devices due to the monitor then needing to understand the interface to every I/O device. As the monitor does not perform all the function of an Operating System, it can stay relatively simple and agile to changes.
4. Evaluation
The authors created Disco, a monitor to prove the viability of this approach. They tested on a single core MIPS processor as well as emulated hardware to match a multiprocessor system. They tested the overheads caused by virtualization on executions times due to trap emulation. They also documented the advantages offered by this model by reducing the adverse effects of the NUMA architecture on execution times by utilizing page migration and replication mechanisms. This system was also tested for scalability against a single Operating System image showing that the monitor was able to reduce the base execution times by partitioning the task among various virtual machines and avoiding kernel synchronization waits. The single core results however are dominated by the overheads of virtualization.
5. Confusion
I am confused by the two memory management structures namely memmap and pmap.
The non persistent shared copy on write disk sectors are also a little fuzzy. Why can not the guest operating system run purely in user mode?

1. summary
The Article targets problem of using modern operating systems for large scale shared memory multiprocessor by developing a prototype called DISCO which inserts a software layer between Operating system and hardware . The layer which acts as virtual machine monitor allows multiple copies of commodity operating system as virtual machines to run on single scalable computer sharing resources and cooperating.it also allowed non NUMA aware OS to run on NUMA architecture
2. Problem
The Author says that the computer industry gives more importance on reliability .All hardware innovations require lot of operating systems code changes. Even after changes the OS may not be stable and hence these innovations are not effectively utilized by all applications.They act as obstacles to innovations in hardware.So author suggests why not develop something which help OS software quickly adapt to new hardware with less chances of instabilities. They focus on a concept of virtual memory monitors introduced during 1974 .
3. Contributions
The Main contribution of the paper is exploring concept of virtual machine monitor to run multiple VM’s on multiprocessor system and also handle the major VM’ challenges.They use concept of Virtual CPU’S which are executed directly on real CPU and handles by VMM ( virtual machine monitor) by keeping state of register and TLB contents. They allows different modes user,kernel,supervisor where OS runs on supervisor and all traps such as page fault .etc are handled by VMM. Other major contribution are the VM’s can share resources such as file system ,I/O devices .They also reduce page faults by using page replication were read only pages are replicated ,and Page replacement where frequently cache miss pages(hot pages) present on other processor then the one where requesting VM’s is scheduled.They are bought to the requesting processor . Used data structues pmmap(physical-machine translation) memmap -used to find closest page replica by maintaining of list where for each machine address corresponding pmap entries. For VM’s to share I/O devices. such as disk they used data structure B-Tree indexed by disk sectors and maintaining a global disk cache for sharing. They also use NFS for VM’s to share files and reduce duplication of data .
4. Evaluation
The Authors runs the disco prototype on Flash system simulated on SimOS. They evaluate on different real time workload .Becasue of of slow performance of sim OS the author was not able evaluate long running workloads which was main key missing part of evaluation. The basic overhead of virtualization ranged from 3- 16% for all uniprocessor, but achieved around 1.7 faster performance when running 8 VM’s compared to commercial symmetric multiprocessor OS.Scalability graphs show better performance of DISCO. They also show the increased performance benefits by using page migration and replication which are very good.Though they were able to show that DISCO overcome drawbacks of traditional VM’s , the evaluations seem to run on simulator. Would have been better if those were run on hardware
5. Confusion
More details on NUMA aware OS? what optimization NUMA aware does.
How Copy on write disks is handled is still not clear to me.

Summary

The paper proposes solving the problem of the inability of system software to quickly and reliably adapt to the advancements in underlying hardware by introducing an abstraction layer through a virtual machine monitor on which commodity / specialized operating systems may run. The solution is then evaluated for its viability through an implementation named Disco.

Problem
At the time of writing this paper, existing Operating Systems had become complex enough that making them adapt to recent advancements in hardware (such as achieving scalable performance on multi-core architectures, managing performance on a NUMA architecture, fault containment etc.) would have required significant developer effort in terms of handling the associated extra complexity and dealing with the new inconsistencies arising due to managing complex and large OS code. Previous attempts in this direction such as Hive, Hurricane, multi-kernel, etc. failed to sufficiently address this problem of significant development effort.

Contribution
The authors propose using a virtual machine monitor as the mechanism through which commodity / specialized Operating Systems operating in their own virtual machines access underlying hardware resources and communicate with each other using distributed systems principles. The monitor abstracts processors, physical memory and network I/O and is responsible for efficient and and secure abstraction of these resources.

Processors are virtualized through limited direct execution barring for certain special instructions, which are explicitly handled by the monitor. Physical memory abstraction is sped up by having direct virtual to machine address mappings in the TLB / cache, as well as constructs such as the virtual machine specific - software TLB which reduces the penalty of a cache / TLB miss. The authors also developed and implemented dynamic page replication and relocation strategies to hide the NUMA character of the underlying multi-core architecture by favoring usage of local processor memory wherever possible. The virtual machines also transparently share memory regions and the buffer cache using copy-on-write semantics which greatly improves data sharing performance amongst cooperating virtual machines.

Overall, the solution allows making simple changes to an operating system to make it fit for running on a virtual machine hosted by the monitor. While the operating system running on the virtual machine may be generic and not NUMA-aware, the simple monitor implementation can be tuned specifically for the underlying hardware to obtain best performance.

Evaluation
Disco, a virtual machine monitor based on the above design principles was developed to evaluate this design. The authors first compare Disco running on an emulated multi-core RISC processor system (FLASH) using the SimOS simulator against the IRIX Operating System for a variety of common OS,compute, memory and I/O intensive workloads and compare the virtualization overheads for execution time and memory. They also evaluate the workload scalability and NUMA behavior of Disco, and conclude that it performs reasonably compared to a traditional commodity system.

The authors also implement Disco on single core MIPS processor and compare the execution overheads against IRIX. The results obtained again confirm that Disco is a viable alternative to a commodity OS such as IRIX.

Questions / Confusion
The concept of managing copy-on-write disks for temporary / permanent segments was not clear. Can we discuss this in more detail during class?

1. summary
The paper proposes a VMM design (Disco) that allows running multiple commodity operating systems efficiently on the fast evolving shared-memory multiprocessors. The idea shifts the onus of system software design that can exploit the multiprocessor platforms from the complex hard-to-modify commodity OS to relatively simpler VMM abstraction layer. The model addresses the challenges faced by VMMs so as to reduce overheads and provide scalability.
2. Problem
The compute hardware is evolving fast, with scalable multiprocessors becoming popular commercially. However, the traditional system software needs major modification in order to reap all the benefits of these platforms. Such a custom scalable monolithic OS has huge development cost, and is prone to being buggy or incompatible. Further, enhancing various operating systems for the fast changing hardware is not a scalable solution.
3. Contributions
The proposed design re-uses the existing idea of VMMs, and offers mechanisms to reduce the associated overheads, enhance sharing between VMs and efficiently manage hardware resources. The most innovative feature of Disco is its memory management. Using non virtualize ASIDs and second-level software TLBs, address translation is made faster. Further, using optimizations of page migration and replications, along with advanced TLB shootdown mechanism (using special VMM data structures), it handles NUMAness of the system well. The copy-on-write based sharing of read-only disk blocks by caching them in a system wide buffer cache is one of the reasons disco is well suited for multiprogrammed applications. Another key benefit that Disco offers is that it supports lightweight operating systems, like SPLASHOS, which can’t run natively on hardware, but can significantly boost application performance. Thus, Disco enables running of multiple operating systems on same hardware with minimal modifications and offers higher performance, especially for multiprogrammed applications.
4. Evaluation
The design was evaluated mainly by carrying out simulations using MIPS R10000 based FLASH machine’s model in simulator SimOS. Using four parallel/multiprogrammed workloads, they show that VMM imposes nominal performance overheads. They have also analyzed how various optimizations help in enhancing scalability and performance, and reducing memory overheads. By looking at breakdown of execution times, they have presented how traps and in-kernel synchronizations can be the performance bottlenecks, which have been handled with special care in Disco design. Some initial results on a simple hardware implementation have confirmed the experimental observations.
The evaluation, overall, is quite thorough and insightful. However, they have not compared Disco’s performance against other custom scalable monolithic operating systems or existing VMMs to highlight the trade-offs quantitatively. In the scalability study, it would have been interesting to know how a multiprogrammed implementation of application like Pmake running on multiple instances of commodity OS running natively on multiprocessors would compare against the multi-VM scenario. Last, but not the least, I believe the authors did under-sell their work by mentioning that Disco offers smooth transition between existing and future OS designs. As is evident today, VMMs are hard to do away with because of all the benefits they offer.
5. Confusion
The paper does not talk about the “policies” implemented in Disco like load balancing the processors and managing physical resources across VMs. I am curious to know what are the opportunities in that space. Additionally, there are several assumptions about hardware like software managed TLBs and special LL/SC instruction that tie the implementation to a particular platform. How will the VMM adapt to different platforms without giving up any benefits.

1. Summary

This paper discusses the design, implementation and the challenges of using the virtual machine monitors for extending modern operating systems to run efficiently on large-scale shared memory multiprocessors. This approach is demonstrated with the Disco prototype.

2. Problem

It aims to reduce the gap between hardware innovations and the adaption of system software through the virtual machine monitor, which acts like an additional software layer between the hardware and the OS that virtualises all the resources of the machine. Earlier systems required significant OS changes that incurred significant development cost.

3. Contribution

Disco abstracts the processors by emulating the instructions, MMU and trap architecture and extending the architecture to support efficient access to processor functions. It provides an abstraction of main memory residing in a contiguous physical address space starting at address zero. Disco virtualises the I/O devices allowing the OS to assume exclusive access to their I/O devices. It is implemented as a multithreaded shared-memory program. Disco implements dynamic page migration and page replication to deal with non-uniform memory access. The interposition on all DMA requests and copy-on-write allow virtual machines to share main memory and disk storage resources. The copy-on-write feature is applied to non-persistent disks. Only a single virtual machine could mount the persistent disk at any time. This allowed Disco not to virtualise the layout of the disk. A virtual subnet and networking interface managed by Disco allows communicating with each other while avoiding replicated data. It uses distributed file system protocols such asa NFS.

4. Evaluation

Disco was evaluated using SimOS that mimics the FLASH machine. Due to the slowing down of the simulate, long running workloads that fully evaluate Disco’s resource management policies could not be studied. Workloads repressing software development, hardware development, scientific computing and commercial database were used. The execution overheads were studied using Disco running IRIX in a single virtual machine and these were shown to be tolerable(16%). A single workload running under different system configurations was used to evaluate memory sharing and quantify the memory overheads. To evaluate scalability, workloads that are system intensive and a workload interacting poorly with th virtual memory system were used. An I/O intensive application scales well on Disco. Workloads exhibiting poor memory system behaviour were used to study Disco’s page migration and evaluation strategies. Disco was ported on SGI Origin200 to form the basis of FLASH. Relevant workloads to test the said objective of the system were used. Suggestions for improvements and detail analysis of the results were done.

5. Confusion

1. Kernel mode references and user mode references
2. TLB mechanism

1. Summary
The paper discusses about Virtual Memory Monitor layer which abstracts the Hardware for multiple instances of OS as a solution to the growing concern to system software not being able to keep up with shared memory multiprocessors. The authors also talk about Disco, an instance of the solution which proved their theoretical ideas experimentally to a certain extent.

2. Problem
There was no existing solution which could handle the advances in multiprocessor hardware while at the same time require minimalistic development work in OS and scale well. Another solution which suggested multiple instances of tightly coupled OS did not efficiently manage the resource sharing.

3. Contributions
The authors implemented an abstraction between the multiprocessor Hardware and the OS in the form of Virtual Memory Monitor (VMM). This required a minimalistic amount of change in the commodity OS and still be able to efficiently utilize the resources. Experimental results performed on SimOS showed that resource sharing techniques (such as Dynamic Page migration) outweigh the overheads of introducing VM layer. The authors also provide a solution to problems such as fault containment and memory latency in NUMA by replicating the pages which are referenced quite frequently (or termed as “hot” pages). Also, by using techniques such as Copy-on-Write, they are able to ensure consistency across replicated pages.
Monitors also virtualize the underlying processors by providing direct but limited access, so that the overhead of traps can be reduced. Physical memory access are improved through software TLB which use pmap data structure to provide VPN to Machine Page mapping. Another memmap data structure is used to provide a global coherency and sharing of the data. I/O management is also handled by VMM by allowing the OS to access the devices through Monitor Device Drivers, which otherwise require the VMM to emulate the I/O device and end up in a scenario where Disco (an instance implementation) needs to understand the interface of every I/O device.

4. Evaluation
They compared different workloads running only IRIX against IRIX with Disco. By simulating the hardware using SimOS, (due to the absence of FLASH machines) they observed the overheads in the execution time upto 16% and the improvements in memory footprints of about 40MB. They also ran single processor hardware machine which had MIPS R10000, to provide a real sense of extra execution time due to VMM.

5. Confusion
I did not completely understand the concepts of
- supervised mode
- protected segments
- subtle difference in Software TLB and software reloaded TLB.

1. Summary
The authors in this paper discuss the problem of modifying current OS on large scale shared memory multiprocessors. They propose a potential solution which can run commodity OS on such hardware using virtual machine monitors (VMMs) after minimal changes. The authors describe a prototype VMM, Disco, which offers virtualization and allows many OS to run on multiprocessor systems simultaneously with very little virtualization overhead.

2. Problem
Hardware innovation and application requirements are driving the development of shared memory multiprocessor. However, current OS are not scalable. Modifying them for such systems requires partitioning of the system, maintaining a single system image across cores and features like ccNUMA management. These OSs run into millions of lines of code and making changes to them is both a time consuming effort and increases the potential of bugs in the software system. Also, consumer hardware vendors might find it difficult to pursue companies developing commodity OS to tailor their software to run on these new hardwares given the wide variety of such products in the market and the rapid rate of innovation.

3. Contribution
The main contribution of this paper is to take the relatively old idea of Virtual Machine Monitors. They utilize the innovation in hardware and other software techniques to reduce the overheads of such a machine. This layer virtualizes all resources of a processor and manages all the accesses to this machine in order to allow multiple OSes to coexist. Thus, applications running on non-NUMA aware OSes could benefit from a NUMA-aware monitor in NUMA systems. Their implementation of such a VMM, Disco, uses innovative methods to reduce the overhead of virtualization. VMMs suffered from high overhead costs associated with trap instructions. They optimized their kernel to reduce such scenarios by developing improved implementation of device drivers and modifying the methodology to access privileged instructions. Further, innovations like L2Tlb were also incorporated to make the costs of addition of these new layer minimal. However, the biggest contribution of this paper is to develop mechanisms to share data between OSes running on different VMs. Finally, Disco uses page migration and replication to hide the NUMA nature of underlying HW and ensure that most cache misses are serviced from local memory. Disco intercepts access to all I/O devices and forwards them to physical devices.

4. Evaluation

The overheads of virtualization is evaluated via detailed simulations of four diverse workloads. They use a Flash like hardware simulated in SimOS for their experiments. The results suggest that virtualization adds between 3-16% of overhead for uniprocessor workload. Disco’s innovative memory sharing techniques significantly reduce the memory footprint for running multiple VMs (almost 50% for 8 VMs). The improved scalability offered by virtual machines and the benefits of page migration and replication are also quantified. However, the authors do not mention any experiments to measure performance of I/O devices, disks or networking.

5. Questions
I did not understand the Copy on Write section and the special handling of persistent data was not very clear.

Summary

In this paper author propose a prototpe Disco - a virtual machine monitor(VMM) to build system software that can run on scalable shared-memory multiprocessors. It addresses most of the challenges faced by earlier versions of virtual machines, namely overheads, resource management and communication/sharing.

Problem

The authors try to address the problem of extending modern OS to run efficiently on large scale shared memory multiprocessors without a large implementation effort. The changes to the OS are often complex and extensive, resulting in softwares trailing behind hardware developments. Also changing OS code frequently to support new hardware could result in buggy system software.

Contribution

To address the problem, authors propose inserting an additional layer of software between hardware and OS, called Virtual machine monitor and running multiple operating systems on different virtual machines, resulting in higher scalability and fault containment.
Disco does this by eliminating or reducing the problems which earlier virtual machine monitors faced. By leveraging advancements in OS design, their implementation of virtual machine monitor greatly reduces problems prevalent in previous versions of vm’s, namely overhead, efficient resource management and communication.
Disco allows for efficient sharing of memory and disk resources between virtual machines. The sharing support allows Disco to maintain a global buffer cache which is transparently shared by all the virtual machines.
It the execution of the virtual CPU's by using direct execution on the real CPU and setting the real machines registers to those of the virtual CPU's. Memory management is done by adding another layer of address translation and Disco maintains a physical-to-machine address mapping using a software TLB. Key notions include-
1. Abstract NUMA memory as UMA memory so that non NUMA aware OS to run on NUMA machines
2 Propose network interface for handling large transfer size without fragmentation between two virtual machines
3 Software TLB to reduce TLB miss
4 Dynamic page migration and page replication system for maintaining locality and reducing cache miss
5 Copy on write mechanism for sharing both memory and disk storage resources


Evaluation

The evaluation has been for measuring execution/memory time, scalability and page migration-replication implementation. Memory overhead is significantly lower due to effective sharing of the same data used by multiple VM's and the Execution Overhead of virtualization is nominal for various workloads . The observation is that using two VM's, scalability outpeforms overheads of virtualization and the performance improves with 8 VM's.

Confusion
The experimental setup cannot be compared to actual scenario, simulations are performed on a hardware simulator instead of actual hardware.
A virtual machine monitor should focus on scheduling Operating system related processes on the virtual hardware, leaving application level scheduling with individual OS. Similarly, virtual machine manager should bother about fair allocation of virtual memory to participating operating systems, leaving application level memory management with the OS.

1.Summary:
This paper is about design and implementation of virtual machine monitors to run multiple commodity operating systems as virtual machines on scalable multiprocessor systems. This abstraction layer also provides non uniform memory access and reduces shared memory overheads. Disco is the virtual machine monitor that the authors implement and evaluate.

2.Problem:
The system software was expected to adapt to handle scalable multiprocessor systems with continuously changing hardware. This was a complex task.
Disco solves this problem by adding an abstraction layer between the hardware and operating systems which performs resource management and sharing of memory regions.

3.Contributions:
a. The virtual machine monitor abstracts the underlying hardware and provides virtualization of processors, physical memory, I/O devices and network interfaces.
b.Dynamic page migration and replication of pages by the abstraction layer helps handle the non uniform memory access times of the commodity operating systems(non-NUMA aware)
c. Addition of address translation level from physical pages to machine pages, apart from translation of virtual pages to physical pages by operating system virtualizes physical memory. "pmap" data structure per virtual machine in Disco is used for computing the TLB entries. Also, virtual to machine translations are stored in Software TLBs to improve performance.
d. Effective memory sharing by mounting disk to corresponding virtual machine for persistent data and copy-on-write mechanism for non-persistent changes.
e. Changes to HAL to reduce overhead due to traps and help resource management by passing hints to monitor.

4.Evaluations:
a. The authors evaluate the system by comparing different set of workloads running on IRIX versus Disco, by simulating the hardware using SimOS due to the absence of FLASH machines.
b. The overhead of virtualization seems to vary based on workloads, being 3% for Raytrace and 16% for Pmake and Database workloads. This is mainly due to trap emulation in the monitor.
c. The authors have also shown how Disco's transparent memory sharing reduces overhead as the number of VMs grow when compared to IRIX in memory footprint measurements.
d.By means of page migration and replication, Disco enhances memory locality and shows around 38% performance improvement in comparison to IRIX.
e. Overall the system has been shown to be efficient despite the fact that introduction of monitors has caused lot of overheads in the system.

5.Confusion:
Does the solution work for asymmetric multiprocessors too?

1. Summary

This paper presents an effort to bridge the gap between innovations in hardware and the extensibility of system software to support such hardware, specifically in the context of scalable shared-memory multiprocessors. The key idea explored in the paper is to introduce virtual machine monitor as an additional lightweight layer of software between the hardware and the operating system that virtualizes all resources of the system and allows a number of different operating systems to run on top of it.

2. Problem

Given the size, complexity and huge development cost, adaptation of system software to make it performant on newer innovative hardware has been slow-paced and continues to lag behind. The authors argue that this is specifically evident in the case of scalable shared-memory multiprocessors and to address this issue they have put forth an idea of using virtual machine monitors, as an alternative to modifying existing operating systems.

3. Contributions

To demonstrate the practicality of their ideas, the authors have a built a prototype system called Disco that justifies the benefits of using virtual machine monitors against their perceived virtualization overheads- much of which, have been mitigated in recent times due to advances in the operating system technologies. Disco is a lightweight multithreaded shared memory program that provides for higher degree of tuning for the underlying CC-NUMA hardware than existing bulky operating systems. By allowing multiple virtual machines to run on the same hardware by virtualizing resources, it provides virtual machine as a unit of scalability and fault containment. Existing operating systems and applications can now run unmodified on a MIPS R10000 abstraction exposed by Disco. It even provides for a nearly uniform memory access to the existing applications over the underlying CC-NUMA hardware by using dynamic page migration and replication. A global buffer cache supported by distributed-file-system protocol is used to improve reliability and scalability of the system.

4. Evaluation

The authors have evaluated their system on SimOS, a machine simulator that models MIPS-based multiprocessors, with an IRIX OS running on top of it. Four representative short workloads for a typical scalable compute servers were used to study the scalability benefits provided virtualization using VMMs over CPU and memory overheads. From the results, it was found that Disco only presents 3%-16% of additional overhead against the IRIX operating system being run directly, most of which is due to trap emulation and TLB reload misses. The authors have talked about running heterogeneous operating systems on top of Disco, however such an evaluation for heterogeneous virtual machines and heterogeneous workloads is missing and could have added more insights into scalability and reliability offered by Disco.

5. Confusion

How exactly is the interposition on DMA requests is used to share disk and memory resources among virtual machines in Disco?

1. Summary
The paper revisits the concept of virtual monitoring machines as a solution for operating systems to operate efficiently on large scale multiprocessors without significant re-engineering effort. It also discusses the design and implementation of Disco, a VMM and explains how it tackles the challenges with resource management, switching overheads and VM communication.

2. Problems
Operating Systems consists of millions of lines of code and requires significant implementation effort to efficiently use the evolving hardware. Even with the changes, it is hard to ensure stability. Virtual Monitoring machines offers solution to this problem but with a few challenges with respect to hardware virtualization, resource management, sharing and communication overheads. Without a clever implementation, this could negatively impact hardware’s performance.

3. Contribution


  1. Virtualizing Hardware resources. Virtualizing hardware resources like CPU, Physical Memory, I/O devices and Network Interfaces to the Operating System poses significant implementation challenges. For example, OS is now stripped off of some of the privileges like running in kernel mode, executing privileged instructions and address traps. Disco emulates these operations for seamless execution of the OS. Similarly, VMM has to maintain physical to Machine pages mappings to support execution of multiple OSes.

  2. Memory Management: In NUMA-CC machines, remote cache misses add significant delay to the program execution time. Disco overcomes this problem using page migration and replication technique. Here Disco identifies pages which experience page faults frequently and chooses to either replicate or migrate that particular page. Since the page is now accessed in processor’s local memory, access latency is significantly reduced. Disco also performs copy on write, so that disk access can now be performed instantly by accessing the memory. VMM performs transparent sharing of pages over NFS which improves the latency in sharing data between Virtual Machines.

  3. Running Commodity OS: The paper talks about making changes to Disco to support commodity OSes like IRIX on a particular hardware. For example, a process in supervisor mode cannot access the KSEG0 segment of MIPS and hence virtual processors have no efficient way to execute on the hardware. Disco remaps these pages on to the mapped supervisor segment of the MIPS processor. Similarly it makes small changes to the device drivers and HAL to support IRIX on the FLASH hardware.


4. Evaluation
The authors evaluate their idea with a VMM called Disco. Disco is designed for a CC-NUMA multi-processor, FLASH. The paper evaluates Disco on various fronts like Scalability, Memory footprint, Execution overheads and the effect of doing page migration and replication. They chosen workload characterizes VMM’s ability to handle OS and I/O intensive programs, programs with varied memory footprints and usage of shared memory spaces. The authors have not explained how Disco would scale with multiple OSes ( different opearting systems) running concurrently. This could be a possible future study for this work.

5. Confusions

  1. As mentioned in evaluation, the paper does not talk about how well the VMM scales with different operating system. If many instances of the same OS are run, then some code would be shared between those instances. Running completely different OSes would stress the memory and it would have been interesting to see how well VMM would have scaled. The paper evaluates two concurrent OS execution, but one of them is a lightweight application specific OS.

  2. How large VMM would be when it is able to support all popular hardware architectures?

  3. Also it fails to explain the rationale behind not choosing an available hardware and a popular commodity OS.

1. summary
This paper talks about Disco - a VMM designed for building a scalable software system on shared memory multiprocessors. It discusses different optimizations used for minimizing virtualization overheads and improving data sharing/locality. The scalability and effectiveness of these mechanisms are thoroughly evaluated and compared against a non virtualized OS instance.

2. Problem
An Operating system for a shared-memory multiprocessor has to be scalable and well partitioned to exploit the underlying hardware. Additionally, such an OS requires CC-NUMA aware memory management and fault containment. To re-design a commodity OS to meet these needs is both complex and expensive.

3. Contributions
The basic idea of Disco is to extract out the complexity of designing scalable system software and moving it to a layer of virtualization i.e. to the VMM. Disco runs multiple instances of commodity OSs and applies global policies to better utilize the hardware, all with a much lower implementation effort. Disco introduces several optimizations to reduce virtualization overheads and improve NUMA memory system performance. Using the FLASH’s native hardware counters Disco can identify hot pages and either transparently migrate them to a processors local memory or replicate them if there are multiple sharers. Disco uses copy on write to share disk segments paged in by multiple VMs. Additionally, virtualization provides the flexibility to run a specialized OS for high performance alongside a regular one.

4. Evaluation
The authors measure the virtualization overheads of Disco using a set of multi-programmed workloads. The execution overheads are shown to be tolerable (16%) proving that optimizations like the software L2 TLB are beneficial. Similarly the memory footprint also remains modest with multiple VMs due to sizable code and data sharing. That said, an I/O intensive application tends to scale well on Disco vs running on a single instance of IRIX. Disco’s CC-NUMA optimizations are shown to significantly reduce remote cache misses enhancing memory locality of the workloads.

5. Comments/ Confusion
Disco has some commonality with the multi-kernel idea but introduces a practical and incremental solution to the same problem. It also shows that it is important to share information and work between a guest Os and the monitor to achieve better results, but I think this requires access to the kernel code (not easy always). One major confusion I had was whether Disco manages its own set of swap space if it needs to reclaim memory from an OS.

1. Summary
This paper tackles the problem of extending modern operating systems to run efficiently on large-scale shared memory multiprocessors without a huge implementation cost. It proposes to address this problem by adding a level of indirection between commodity operating system and the raw hardware that uses well established idea of using virtual monitors.

2. Problem
With the advent of several innovations in the hardware, such as multiprocessor machines, and NUMA-awareness. Effectively exploiting such innovations is challenging since it basically requires rewriting the OS to address fault-tolerance and scalability. Such significant OS changes, including partitioning the system into scalable units and building a single system image across the units, are slow in progress and will introduce bugs that can introduce extra instabilities into the system.

3. Contributions
In this paper the authors present an alternative approach for constructing the system software for large systems where they propose to insert an additional layer of software between the hardware and operating system instead of making extensive changes to existing operating systems. This layer acts as a virtual machine monitor where multiple copies of commodity operating systems can be run on a single scalable computer that allows these operating systems to cooperate and share resources with each other. The authors then present a prototype system called Disco that implements this idea. Disco allows OSes running on different virtual machines to communicate using standard distributed-system protocols and also allows for efficient sharing of memory and disk resources. Authors claim that their approach provides a simple solution to the scalability, reliability and NUMA management problems typically faced by large-scale systems.

4. Evaluation
Overall the problem and the solution proposed in the paper seem very plausible. It hits very well on the growing needs to scale to the innovations in the hardware space. The idea to make simple changes to the commodity OS that allows virtual machines to share resources can scale very well since it can be expanded arbitrarily by simply adding a VM. It is also very robust since the systems are isolated. However it also comes with a lot of VM and memory overheads and complications in communications.
The prototype Disco that was designed for FLASH multiprocessor was validated using SimOS. The authors report that the overhead of virtualization ranges from 3-16% and also point that it can run some workloads 1.7 times faster than a commercial multiprocessor OS. Further they point that running a 64-bit system mitigates the impact of TLB misses. Overall they report several efficiency gains in terms of memory utilization, scalability and page migration benefits with small performance overheads.

5. Confusion
It is not clear how privileged instructions are handled within the cluster of VMs.
Authors do not shed much light into memory management, that can be high with several copies of VMs. Finally it is not clear why FLASH was chosen instead of “mainstream” hardware as platform.


Summary
The paper introduces Disco to us, which is a virtual machine monitor designed for the Stanford FLASH multiprocessor, a scalable cache-coherent multiprocessor. Rather than modification of an existing OS, Disco inserts an additional layer of software between the h/w and the OS . This layer virtualises all the resources of the machine and exports a more conventional hardware interface to the OS thus allowing multiple virtual machines to co-exist on the same multiprocessor.
The Problem
With scalable shared-memory multiprocessors invading the commercial markets, it is time to overhaul system softwares to cope with the innovations in the hardware . However, the size and complexity of existing system softwares means that making the requisite extensive changes to them would incur huge development costs. Moreover system software is delivered significantly later than the hardware , that too with below the par reliability and stability. Computer hardware vendors that use”commodity” operating systems face an additional problem of convincing the independent companies to change their operating systems for the new hardware. Hence the authors came up with the idea of DISCO in order to reduce the gap between scalable multiprocessors and the adaptation of system softwares on it.
Contributions
1.The paper reintroduces the concept of virtual machine monitors in the new scenario of scalable multiprocessors.In Disco, some virtual machines may run commodity uniprocessor or multiprocessor OS while others run simple specialized operating systems fine-tuned for specific workloads, simultaneously.
2.Disco allows different operating systems to be coupled via standard distributed systems protocol like TCP/IP and NFS and maintains a global buffer cache which is transparently shared by all the virtual machines even when the virtual machines communicate through a distributed file system..
3.With the careful placement of the pages of a virtual machine's memory and the use of dynamic page migration and page replication, the monitor can hide the NUMA-ness of the memory thereby allowing UMA systems to run on CC-NUMA architecture.
4.Disco manages a virtual network interface that allows virtual machines to communicate with each other while avoiding replicated data whenever possible. It also uses copy-on-write mappings to reduce coying and allow for memory sharing.
5.Disco supports copy-on write disks wherein multiple virtual machines accessing shared disk end up sharing machine memory. To preserve the isolation of the virtual machines , disk writes must be kept private to the issuing virtual machine. Disco logs the modified sectors so that the copy-on write disk is never actually modified.
6.One of the attractions of Disco is that it provides all the aforementioned features and more with relatively very less implementation effort and low risks of software bugs and incompatibilities as compared to other OS approaches for scalable machines like Hive,Hurricane etc.

Evaluation
Disco is evaluated for a variety of workloads on a detailed simulator SimOS of the FLASH machine. With few simple modification to the existing OS IRIX,the basic overhead of virtualisation varies from 3% to 16% for the uniprocessor workloads. It is also shown that a system with 8 virtual machines can run some workloads 1.7 faster than IRIX by increasing the scalability of the system software, without substantially increasing the system's memory footprint. Finally it is shown that page placement and dynamic migration and replication allow Disco to hide the NUMA-ness of the memory system, reducing the execution time by upto 37%.The simulation results are confirmed by porting Disco on a uniprocessor SGI Origin200 board that is to be the basis of FLASH machine.
Confusion
How is the dynamic scheduling of the virtual processors on the real processor exactly carried by in Disco?Is the number of real processors to be allocated to a virtual machine dynamically determined as well?

Summary:
The article introduces Disco, a scalable Virtual Machine Monitor (VMM) that runs on shared memory multiprocessors with optimizations for data sharing and communication, minimized overheads of virtualization. Disco also meant to be a viable solution for running operating systems that assume a UMA architecture on CC-NUMA systems.

Problem:
The paper addresses the problem of writing scalable system software for shared-memory multiprocessors. It does so using the concept of VMM that is optimized for sharing and communication amongst the virtual machines.

Contributions:
The main purpose/contribution of this paper is Disco itself. However, the article proposes a number of improvements to the bare metal VMM.
1) Memory management: Disco allows for sharing data/file systems amongst different virtual machines. This it does by mapping physical addresses from different VMs that share data or the virtual filesystem to the same machine addresses. Disco also brings in page migration and replication to improve data locality in CC-NUMA systems. It does these using the copy-on-write mechanism to reduce copying overheads.
2) Device drivers: Since the overheads of a transaction with devices are large in a virtualized environment, a monitor call in the device drivers of the VMs batches the transactions and traps to monitor only once to get the work done.
3) Software TLBs: The monitor manages a 2nd level TLB in software for the virtual address to machine address mappings. The monitor also manages the hardware TLBs by emulating the privileged TLB instructions of the VM.
4) Virtual network interface: Disco assigns each VM a unique subnet ID to communicate with other VMs on the same machine.

Evaluation:
Due to the lack of the intended hardware (MIPS R10000 based multiprocessor system), Disco was evaluated on a hardware simulator (SimOS) which was significantly slower than the real hardware. Disco is evaluated on a number of fronts. The workloads varied from being highly processor bound to highly OS bound.
1) The execution overheads of running applications on IRIX on Disco over running the applications directly on IRIX varied from as little as 3% to 16%.
2) Although the memory footprint was significant compared to baseline, sharing of data/virtual file systems reduced it to a much tolerable level.
3) The execution times were much better for multiple VMs running application than a single VM running multiple copies of the same application.
For completeness, Disco was also evaluated on an SGI board, the building block of the FLASH multiprocessor. The results turned out to be quite similar to the simulation results.

Confusion:
Although the paper does discuss having two different VMs on the machine (IRIX and SPLASHOS), the paper does not evaluate how Disco would run with entirely different full-blown commodity OS, given that there will be no sharing of kernel code amongst VMs. The article also fails to mention how changes in HAL of proprietary OSes will be made.

1. summary
In order for operating systems to keep up with the scalable multi-processor systems shipped by multiple hardware providers, this paper proposes a virtual machine monitor called Disco which allows multiple commodity and specialized operating systems to run concurrently on the same hardware in a scalable manner while being able to share resources on a cc-NUMA machine.
2. Problem
One of the main problems that Disco aims to solve is to bridge the ever widening gap between system software development and hardware developments that have significantly increased the functionality and reliability expected of modern systems.The main problem is that extensive changes need to be made to millions of lines of system code to efficiently support scalable machines which makes the task more daunting.This environment lead to a lot of dependency and compatibility concerns between hardware and software developers.
3. Contributions
An important contribution is the introduction of a virtual machine monitor in between the hardware and operating system layer. This enables multiple commodity operating systems to efficiently share resources and run on a single scalable system. Most operations run at the same speed as they would run on raw hardware because Disco uses direct execution to emulate execution of the virtual CPU.Physical to machine address mappings are stored in addition to logical to machine address mapping in order to virtualize physical memory.Due to this mapping the physical address for each virtual machine can start at zero. Disco deals with the non uniform memory access time in order to support operating systems which do not support NUMA machines.Disco ensures that cache misses generated by a virtual CPU are satisfied from local memory.Disco intercepts IO messages from the virtual machine and sends them to the physical machine.A virtual networking interface enables memory resources to be shared . Additionally distributed protocols like NFS are used to enable communication between virtual machines.
4. Evaluation
Disco is evaluated on the SimOS which simulates the FLASH machine and models the hardware of MIPS based multiprocessors.The Disco monitor and IRIX operating system is run on top of this simulator.Four workloads that represent typical use cases of a scalable compute server have been used to determine execution time.By analyzing workload execution on uni-processors , it is determined that the overall overhead of virtualization ranges from 3% to 16%. The major overhead here is from the TLB miss emulation design. Using the data sharing patterns of 5 different workloads it is shown that there is effective sharing of kernel text and buffer cache while kernel private data is not shared.
5. Confusion
Was it a good idea to flush the TLB on every VM swap ? In the evaluation section it is explained that the main reason for execution time overhead is the TLB flushing. Could we discuss this trade-off in more detail?

Summary
This paper describes the concept of virtual memory monitors and describes the design/implementation of Disco which has been introduced primarily with the intention of reducing the time and complexity of development, improving reliability and avoiding/minimizing changes to traditional OS as advances in hardware keep coming up in shared multiprocessor systems.

Problem:
As more and more hardware changes occur as in the case of shared memory multiprocessor systems, it leads to situations such as non-uniform memory access (NUMA) which in-turn entails changes in existing system software which leads to operating systems built on very tight deadlines to meet the market requirement and possibly ending up with an unprecedented number of bugs and moreover commodity OS does not scale well for multiprocessors.

Contribution:
1. This paper introduces the virtual machine monitors which in combination with commodity OS provides a scalable, flexible and reliable solution to the problems introduced above.
2. Though it involves a slight performance cost, the additional layer that’s been introduced involves a lower development cost.
It describes the challenges faced by virtual machines such as additional memory overheads for memory management, I/O device virtualization for VMs, additional memory for replication for better performance, resource management difficulties due to the lack of information and sharing/communication issues among VMs.
3. The paper describes the abstractions provided by one such virtual machine monitor (Disco) for virtualizing the processor, physical memory and I/O devices for the various VMs running on top of it. Disco is a multithreaded shared memory program, cache aware and handles NUMA. Code and data segments are often copied in each processor for better locality.
4. In order to virtualize the CPU, each VM has a state that is stored in a data structure on being context switched and the monitor handles scheduling on the physical CPU. In order to virtualize the physical memory, since VMs use physical memory for addressing, the monitor maintains a pmap data structure and it uses a software defined TLB which gets flushed on context switches. To solve the NUMA issue, the monitor uses a transparent page replication and migration policy. To virtualize I/O devices among VMs, Disco receives all the I/O accesses and it forwards it appropriately. It provides virtualization to network interfaces as well using a global buffer cache and standard NFS.

Evaluation:
A simulator was used for evaluation since the FLASH machine was not available, and the system was measured against specific workloads with different characteristics such as parallel applications (splash), memory intensive processes (databases), long and short running programs and their execution overhead, memory overhead and scalability was analyzed along with the performance impact of page replication/migration for NUMA. The authors have analyzed the data and suggested improvements to reduce overhead as well. The author underscores the impact of scalability firmly using this approach.

Confusion/Issues:
The abstractions and data structures used for virtualizing memory was not clearly understandable without further reading.
Why was an unavailable hardware chosen? Not sure if I’m missing something.

Summary
This paper describes a new software layer in between hardware and commodity operating systems to facilitate scalibility on the newer hardware architectures. The paper comes up with the design and implementation of a prototype, Disco, which basically provides a basic abstraction of the underlying hardware resources that allows multiple operating systems to run on it. This requires minimal changes to the existing operating systems. It further also evaluates the use of this new system to a traditional use of single operating system for a variety of workloads. The authors targets one such hardware hardware - CC-NUMA architecture.

Problem
Traditional system software was huge and complicated and hence was not able to keep pace with the increasing advancements in hardware. With the introduction of more scalable shared memory systems like CC-NUMA, it required a lot of efforts to update the system software which created delays in the introduction of new hardware systems to the market. Such complicated system software leads to a buggy system which inturn impacts the success for the hardware. Even with the then virtual machine concept, there were problems with resource sharing leading to high overheads. The authors try to minimized these overheads with new techniques implemented as a part of a virtual machine monitor.

Contributions
1. Introduction of a new software layer, virtual machine monitor, between the hardware and operating system. This helped in reducing the complexities involved in the development traditional operating systems for new scalable hardware architectures.
2. Provided the design and implementation for one such prototype, Disco.
2. Allowed multiple commodity operating systems to run on a single multiprocessor machine in the form of separate virtual machines and helps in exploiting the large hardware resources to it's full extent and allowed flexibility.
4. Introduction of techniques such as replication and dynamic page replacement in the virtual monitor ensured the reduction of overheads for non-NUMA aware applications which inturn provides a support for commodity operating systems to run on NUMA architectures.
5. Issues within the operating systems(i.e. a virtual machine in this case) did not affect the entire system and hence provided fault containment.

Evaluation
The paper does some extensive testing with typical different types of workloads and compares it with the traditional operating systems. These workloads included multiprogrammed and short-lived processes (Pmake), long-running processes with high memory footprints(Engineering), execution of a single shared-memory parallel application(Raytrace) and a commerical database workload. An execution overhead of 3% - 16% was found with Disco owing to it's trap emulations and TLB reload misses. Memory overheads were significantly reduced as the number of VMs increased due to the monitor's effective sharing techniques. Techniques of partitioning a problem helped in scalibility and reduced execution times.

Confusion
Copy-on-write implementation for disks. Confused with the logging mechanism.

1. Summary
This article talks about developing a system software on top of scalable shared multiprocessor system which could run multiple copies of commodity operating systems virtually with shared major data structures in order to improve scalability and provide uniform memory access to NUMA-unaware OS with smaller implementation effort.
2. Problem
To form a high performance system software targeting innovative hardware like scalable multiprocessor, significant development cost is required in OS with the risk of instabilities, including partitioning into scalable units, building a global system image with fault containment features. With such significant changes, there is a risk of introducing incompatibilities and bugs which can hinder portability/reliability of existing applications and thus even overshadow the benefits of innovation in hardware.
3. Contributions
Instead of enhancing existing operating system to run on modern hardware, authors of the article introduced a new layer of software between the hardware and OS, called virtual machine monitor virtualizing all the resources of the system and exporting a more conventional hardware to multiple virtual machines running commodity OS or specialized OS fine-tuned for specific workloads. This was realized by building prototype called Disco. These machines can even communicate using standard distributed protocols. Memory regions could be shared explicitly across machine boundaries unlike earlier stubs with relatively small change in commodity OS. This design also enables failures in system software and hardware to be contained to few machines. Disco extends interface for commodity OS to provide for efficient frequent kernel operations, nearly uniform memory access by employing dynamic page migration and replication. With innovative emulation of DMA engine, code and read-only data can be easily copy-on-write remapped between multiple machines. To optimize resource management decisions, HAL of OS was modified to pass hints of resource utilization to monitors and introducing some modes like reduced power consumption. Future work includes dealing with increasing complexity of modern machine level interfaces.
4. Evaluation
Authors of the article configured SimOS to resemble a multiprocessor for evaluation purposes. For compute bound workloads, trap emulation of TLB misses imposed some overhead. Heavy use of OS services, memory stall and instruction execution is particularly stressful in case of parallel compilation. A common interface for all system calls and interrupts can also enforce some slowdowns in such cases though it is observed that use of larger pages with IRIS6.2 OS can reduce some of the overheads. In case of memory overheads, effective sharing of kernel code and file buffer cache limits overhead of running multiple virtual machines. In terms of scalability, increasing machines lead to reduction in execution time primarily due to reduced kernel stall time and synchronization with the exception in NFS. Even though authors attempted to port Disco on Origin, there would have better evaluation with actual FLASH machine specifically in case of examining long running overheads.
5. Confusion
Not understood how the distinction between kernel mode references and user mode references is handled on MIPS processor with TLB.

1. Summary
Disco tries to ease the process of making existing operating systems scalable on multiprocessor systems by extending the idea of virtual machines.

2. Problem
Traditional operating systems cannot utilized innovative hardwares very well and it takes a significant amount of effort to modify them. The system built in this way is also potentially incompatible with existing applications and not reliable.

3. Contributions
To fully take advantage of the scalable multiprocessor hardware, the development effort into existing operating systems can be reused.
Overhead of traps in guest systems are reduced by allowing accessing to privileged registers using load and store on preset addresses.
The translation from virtual addresses to machine addresses are done directly in the hardware TLB. Physical addresses for virtual machines are translated to machine addresses at the time when the guest OS handles a TLB miss, which is less frequent as Disco is maintaining a larger second-level software TLB for each virtual machine.
The NUMA behavior of the hardware is transparent to the guest. It is taken care of by the monitor with page migration and replication.
Virtual disks can be shared or private with copy-on-write mechanism.
Virtual machines communicate through virtualized networks where NFS is optimized.
Hints are added in the hardware abstraction layer to help cooperation between the guest and host.

4. Evaluation
The experiments are done on FLASH simulated by SimOS and a physical single processor hardware.
In the simulated experiments, they measured the execution overheads, memory overheads, scalability and the effectiveness of NUMA management. Only short-running workloads are used as the simulator is slow, but the authors argue that detailed information given by the simulator can make up for it. As the results shown, the execution and memory overhead is acceptable, and its scalability as well as NUMA behavior is much better than the original IRIX.
On the physical hardware, they only showed that the execution overhead is low.

5. Confusion
It seems Disco is tied to the FLASH board or at least MIPS to some extent. Although it blames MIPS for the unmapped kseg0 segment, it depends on the presence of supervisor mode, privileged registers, LL/SC instructions, software managed TLBs, cache-miss-counting facility and so on. I am not sure how these ideas can be applied to other ISAs.

1. Summary: This paper is about updating operating systems to take advantage of the NUMA shared memory hardware architectures. The authors propose writing a new layer of the software between the OSs and the hardware, and in the process, reinvent virtual machine monitors. The authors discuss “Disco”, their implementation of a VMM for FLASH multiprocessor. The work presented in this paper formed the foundations for VmWare.
2. Problem: With the advent of shared-memory multiprocessors and NUMA architectures, the OS needed to be updated to utilize the resources effectively, in other words it should be architecture-aware. This problem is not limited to NUMA updates, but to any hardware innovation. The authors identify that porting an OS is not only time consuming, but also prone to bugs. Moreover this also hinders the use of older applications on updated versions (unless backward-compatibility is ensured).
3. Contribution: The authors revisit an old concept called Virtual Machine Monitors, and add extra features to solve the problem at hand. The use of VMMs not only solves the issue of effectively using the hardware, but also lets multiple instances of different OSs to run concurrently on the machine, by virtualizing CPU, Memory, IO etc. This also lets specialized OSs to run on hardware, which would have been otherwise under-utilized if only this OS was running. To specifically make the VMM NUMA-aware they added dynamic page migration and page replication, which track the page usage across cores and brings pages closer to the executing core to reduce their access latency. They also use code replication across cores, a concept which was borrowed later in multikernel OS designs like Barrelfish. They use “direct execution” to accelerate the performance of a guest OS, and only trap on privileged instructions, such as TLB modification. They also maintain data structures like pmap and memmap to accelerate TLB hit access, and add level 2 TLB to reduce page faults of guest OS. By designing a layer below the guest OSs, they were also able to share pages, and schedule IO devices and disks which would have been difficult in case of stand-alone OSs running individually.
4. Evaluation: They implement their VMM for FLASH architecture, and used SimOS to emulate a MIPS R10K core (used in FLASH) to evaluate their performances. They compare it to IRIX OS’s standalone performance. Due to the MIPS architecture’s support for bypassing TLB for kernel segments, they had to recompile and relink IRIX kernel. The execution overheads of virtualization range from 3% to 16%, but they do a thorough analysis to note that kernel time actually reduces in some workloads, but slow down is caused by Disco for things like trap emulation. They also show the importance of resource sharing by running multiple instances of the same workload on different guest OSs. The evaluations to show scalability and benefits of page migration and replication also showed good results.
5.Confusion: The Disco implementation traps and emulates in case of privileged instructions like TLB modification. There are some instructions on x86 that perform different things when running in user/privileged mode. How does Disco handle that? How often is a page moved/replicated, and how is it decided? Where does the l2tlb reside? If it resides in hardware in each core, why not make the entire structure one big L1 tlb? The concept of copy-on-write disks with temporary and permanent disks is not clear? What happens on system crash?

1. Summary
The paper explains about the idea and implementation of virtual memory monitors, that can make operating systems run efficiently on large-scale shared memory multiprocessors. Virtual monitors act as a layer between the hardware and operating systems. This enables multiple operating systems to run on the same hardware for scalability, with the monitor acting as an intermediate layer supporting communication, resource sharing, IO devices and network communication.

2. Problem
System software is trailing the developments in hardware in terms of functionality expectations of user. Construction system software for large computers is a very slow and complicated step. Changing operating systems to keep up with hardware can take a lot of code and time to implement the changes. The authors solve the problem by implementing the idea of monitor- additional layer of software between the hardware and OS. This layer acts a virtual machine monitor allowing multiple Operating systems to run on top of it.

3. Contributions
The main idea is the design of the virtual memory monitor layer Disco. The monitor being a single piece of code is smaller, easier to develop and has less chances of introducing bugs and incompatibilities. Disco virtualizes all the resources of the machine such that multiple virtual machines can exist on the same multiprocessor. The main task of the monitor is to schedule the processor and memory of the virtual machines on the physical resources of the scalable multiprocessor using global policies. Disco provides communication between virtual machines. Addition of this layer adds flexibility to the system to support variety of workloads fairly efficiently. This system is much more scalable than the single OS model since only the monitor needs to scale instead of the entire OS. Disco introduces the notion of Virtual CPUs, Virtual Physical Memory, Virtual IO Devices & Virtual Network interfaces to create abstractions over the physical hardware of the machine and maintaining a mapping between virtual and physical resources.

4. Evaluation
Various experiments were performed to measure the execution and memory overheads, scalability and efficiency of page migration and replication between virtual machines. Overhead of virtualization ranges from 3-16% with higher overloads for compute based workloads. Sharing optimizations in Disco enable much lesser memory overhead compared the no sharing case.

5. Confusion
Will the cache performance be worse with DISCO as compared to single OS systems?

1. Summary
This paper discusses the design and implementation of Disco - a VMM.
It additionally affirms the fact that both cost and time to market can be reduced significantly by adding a new layer of abstraction to tackle the advances in hardware.

2. Problem
System software most often trails behind new hardware features and it is expensive to modify OS's to accommodate these changes correctly without introducing additional bugs. The authors propose a solution that allows OS's to run as virtual machines (VM's) atop the virtualised environment managed by a VMM. Adding additional layers impose overheads which need to be kept to a minimum.

3. Contributions
Rather than modifying the OS to accommodate new developments, the authors propose that a new layer of abstraction be added between the hardware and the operating system. This results in the platform (VMM) being able to run multiple instances of different OS's simultaneously and increases system scalability, resource utilisation and fault tolerance.
Second level TLB makes VM's appear to have a much larger TLB.
Dynamic page migration and replication deliver a near uniform memory architecture to software to support OS's not developed for NUMA machines can be considered as yet another great contribution.


4. Evaluation
The paper describes performance testing of Disco by running a set of workloads and comparing the against commodity OS's. The execution overheads ranged from 3% to 16% and the overhead was mainly due to the trap emulation of TLB reload misses. The authors also present benefits of resource sharing by running workload onto multiple virtual machines. Results also show that performance increases with the page migration and replication policy.

5. Confusion
It would be wonderful to learn more about ccNUMA ??
Additional details and insights about the memmap structure would be interesting??

1. Summary
This paper talks about a the idea of using Virtual Machine Monitor as a layer between OS and hardware to manage hardware resources between multiple commodity multiprocessors in a scalable manner. They created a prototype called Disco and evaluated its overhead, performance, scalability etc

2. Problem
Extensive modifications in system software are required (in form of millions of lines of code) to provide scalable shared-memory multiprocessors. It takes a long time to produce such systems and they too are likely to have a lot of instability

3. Contributions
The main contribution of the paper was to introduce a VMM (Virtual Machine Monitor) layer between hardware and OS. It manages hardware for all OSes and also minimizes the overhead of virtual machines. Their prototype was called Disco. It emulated the execution of many CPU (with the required data structures to handle traps etc), adds a level of address translation (physical to machine address) and handles mapping along with providing page sharing (transparent), intercepts all device access (I/O interactions along with providing copy on write functionality, use of virtual network interfaces), use dynamic page replication & migration to reduce cache misses (NUMA memory management)

4. Evaluation
They used SimOS to perform simulated based experiments and comparisons were made between IRIX and Disco. The overhead incurred on Disco was 3 to 16% and this was mainly due to traps and TLB reload misses. The data sharing aspect of Disco reduces the memory overhead compared to Disco without sharing. They then show the scalability benefit of Disco (after a certain number of VMs the scalability benefits outweigh the overheads). Finally they show that dynamic page migration and replication feature of Disco can improve the performance by 33 to 38%

5. Confusion
Did not fully understand the changes and the reason behind the changes made in IRIX and HAL. Did they run their system/ how did it perform on long workloads?

Summary
Taking motivation from the 70's work of Virtual Machine Monitor(VMM), the authors provide seminal work towards virtualization: running multiple ‘commodity’ OSes on a large multiprocessor, with reduced virtualization overheads, numa hiding for non-numa-aware systems to run, transparent fine-grained sharing of resources and all this in lesser implementation effort. They achieve ~2x speedup on some workloads using 8VMs.
Problem
Availability of scalable multi-processors leads to requirements such as extensive modifications in the OS which normally is unreliable and comes in late. There is a need of server consolidation, user requirements to run multiple OS in a single machine, industry requirements to run multiple platforms for debugging software.
Contributions
Disco is a virtual machine monitor designed for scalable cc-numa FLASH multiprocessor. It virtualizes all resources: i.e, provides abstractions for memory, CPU, I/O and network. It is implemented as a multithreaded shared-memory program. Major ideas-
1. Supervisor mode provided by MIPS machine allows the OSes to run in a limited privilege mode(protected portion of address space).
2. VMM-level software TLB- to store the physical-to-machine translations, flushes it on CPU switch but caches it in a second-level TLB.
3. Dynamic page migration and replication to maintain data locality and allows UMA memory system policies to work on CC-NUMA multiprocessors.
4. Distributed protocols to communicate like sharing files through NFS
5. Fault containment- failure in a software/hardware does not spread across the entire machine
Evaluation
Workloads are wisely chosen to justify the results such as being compute/IO/memory intensive and some with long and short-running processes. Execution overhead of virtualization varies from 3% till 16% , memory overhead remains overall small, scaling to 8VMs reduces execution time to 60%, dynamic page migration leads to 33-38% performance improvement.
Comments
Another long, but strong paper written by the founders of VMware. All the concepts employed towards virtualization are well argued for, evaluations are reasonable. Mostly left confused with the following, majorly because it was architecture-intensive-
overhead/advantage of supervised mode, is scheduling done with a global or local view of the VMs’ states, could not decipher fig7: data sharing evaluation, weren’t there any security issues to address?

Summary

This paper describes Disco - a virtual machine monitor(VMM) to build system software that can run on scalable shared-memory multiprocessors. Disco is built on top of the relatively old idea of VMM allowing multiple virtual machines(VM) to run independent OS on the hardware.

Problem

Any hardware innovation poses serious question to the system software development as significant OS changes needs to be done to support the new hardware requiring a lot of development effort and cost. This ends up impacting the success of such innovative hardware as the system software is delivered late and ends up having bugs impacting the reliability. Furthermore, the older idea of virtual machine monitor doesn't instill confidence as there is a performance impact due to virtualization of the hardware-resources, resource management, sharing and communication. So, there arise a need to find a solution to quickly develop the system software with fewer development cost/effort so that the gap between hardware and system software delivery and the performance is not impacted.

Contribution

The biggest contribution is the illustration of the how to tackle the problem in building system software for scalable shared-memory multiprocessors(cc-NUMA) without massive development effort and not compromising on the performance. Moreover, Disco does this by eliminating or reducing the problems which earlier virtual machine monitors faced.
Disco emulates the execution of the virtual CPU's by using direct execution on the real CPU and setting the real machines registers to those of the virtual CPU's. Memory management is done by adding another layer of address translation and Disco maintains a physical-to-machine address mapping using a software TLB. Disco uses dynamic page migration and replication to realize nearly uniform memory access time memory architecture to the software. Disco intercepts the access to I/O devices from the VM's and forwards it to the physical devices. To support communication, Disco virtualizes access to the underlying networking devices too. With few other optimizations like Copy-on-Write, Disco overcomes the existing problems of overhead associated with VMM's.

Evaluation

As the hardware was not available at the time of development of Disco, it has been evaluated on a simulator. The evaluation covers measuring execution/memory overhead, scalability and page migration-replication implementation. Execution Overhead of virtualization ranges from 3% to 16% in various workloads while the memory overhead is significantly lower while running multiple VM's due to effective sharing of the same data used by multiple VM's. The authors observe that using two VM's, the scalability outweighs the overheads of virtualization and the performance improves with 8 VM's.

Confusion
How does the VMM handle global decisions like scheduling of the virtual CPU's on physical CPU's?
On a page being replaced by other page on the machine address, how does that information moves up until the virtual machine?

1. Summary
This paper describes Disco, a new virtual machine monitor designed to efficiently run commodity operating systems on scalable, multiprocessor machines.
2. Problem
Computers with multiple processors with multiple cores were rapidly becoming much more widely used. Existing operating systems were adapting very slowly to take advantage of the full power of such hardware. While specialized operating systems existed for such hardware, many programs had never been ported to those systems. Virtual machine monitors also existed, but were generally too slow for use.
3. Contributions
The authors solve these problems by designing a new virtual machine monitor. Since it is much smaller than most operating systems, they expect that it could be ported easily to new hardware. It is designed to take advantage of the multicore context, for example by using data structures with good cache performance. To achieve good communication, it primarily uses shared memory for communication. They ran this on a simulator and also ported it to real hardware. In addition, they modified an existing commodity operating system to run as a virtual machine on this monitor, as well as implementing a specialized, library operating system that runs on their system.
4. Evaluation
The authors primarily evaluated their work on a simulator. They measured the execution overhead and memory overhead of several programs. To evaluate execution overhead, they compared the execution time of the programs run in an operating system directly on the simulator and in a single VM with that same operating system running on the virtual machine monitor. To evaluate memory usage, they ran the programs both in an operating system directly on the simulator and also in multiple combinations of VMs (1, 2, 4, and 8), where with 8 VMs they had the VMs communicate via the standard NFS file system as well as using private exclusive disks. Running on a simulator seems a questionable measurement technique (necessitated by the unavailability of the platform the monitor was designed for), but also one that could give better diagnostics.
5. Confusion
I found their persistent references to NUMA confusing, even after looking it up. I also found their statement “Disco differs from Exokernel in that it virtualizes resources rather than multiplexing them” intriguing.

1. summary
The paper introduces the method that a specific abstraction level called monitor takes place between hardware and guest OSs running on the virtual machines to use the resource of scalable shared memory multiprocessors effectively. The each virtual machines can run different operating systems to boost the execution performance and the monitor can manage global resource efficiently using page migration, replication, and resource allocation.

2. Problem
Scalable shared-memory multiprocessors forces existing operating system to change because the number of cores in a chip are increasing. The modifications of existing OS requires a lot of efforts and is costly, and sometimes it results in critical bugs.
Because the changes in software needs a lot of time, system has to wait until software is prepared, however it doesn’t guarantee the correct functionality of modified OS. It causes the reliability issue because the customer usually doesn’t want to use faulty system.
The support from the software vendors is one of the problems because the thing that the hardware is developed does not force the vendors to change or develop the existing OS to use new hardware.

3. Contributions
By inserting an additional layer of VM monitor in existing operating system, multi-os can be run on a machine and shares hardware resources efficiently. Disco allows to use specialized operating system to obtain high performance which usually comes from lightweight operating system with accelerators. The monitor executes the interrupt operation such as TBL update and IO access from virtual machine by translating the interrupts into setting special register.
The TLB data is handled by software TLB and whenever TLB needs to be updated, the monitor translate the virtual address to physical address and physical address to machine address unless there is no translated data in cache. The TLB data is flushed if there is a page update by request from another OS. In this case, memmap and pmap track reverse information from machine to virtual address to update old TLB information in each virtual machine
Disco is easy to scale for specific tasks and has abilities to contain fault in a virtual machine not whole system, preventing the system from halt or erroneous operation.
In order to retain non-uniform latency of memory access, it also supports memory management using dynamic page migration and page replication maintaining the localities through virtual machines. In addition all the memory access is executed with a machine address translated by a physical address provided by a logical address.

4. Evaluation
The overhead of Disco ranges from 3% for Raytrace to 16% in the Pmake Data base workloads. The reason why Data base workload has big overhead with Disco is becasue of heavy TLB miss rate. The interrupt based instruction and page fault handler require to double or triple the execution time.
The overall reduction in execution time is reduced to the range from 30% to 50% in case the memory is replicated in each virtual machines while the migration make the performance a little worse.

5. Confusion
Why unmapped segment of the kernel virtual address space is impossible in Disco?
Is there any data shared among virtual machines beside monitor kernel and host OS?

1. Summary
The authors present their twist on the implementation of virtual machine monitors. They create a prototype called Disco, which allows VMs to run multiple operating systems on a scalable multiprocessor. The necessary support modifications to memory, scheduling and other aspects of usual operating systems functions are described. The authors then perform comparisons across a series of workloads on Disco with varying numbers of VMs and compare it to the performance of IRIX, the base operating system.
2. Problem
Software in computers has been trailing behind hardware when it comes to user functionality and reliability. Operating systems often require extensive changes to adapt to different processors and hardware, and there is an accordingly high development cost. For developers of “commodity” OSes, there is an even greater barrier to consumer entry and support. Disco is presented as a way to relieve hardware constraints and allow more flexibility
3. Contributions
Disco is a “virtual machine monitor” (VMM) and essentially an additional layer of software inserted between the hardware and the OS. The monitor is responsible for the virtualization and management of machine resources, thus allowing multiple VMs to co-exist on the same multiprocessor. The VMM handles scheduling and utilization of memory resources with a set of global policies.
Advantages:
(1) Memory sharing. Applications can use shared memory regions across VM boundaries, reducing the amount of replication that would be required as compared to running them in a separate cluster of workstations.
(2) The flexibility of the VMM allows for the implementation of multiple OSes, including “specialty” or “commodity” OSes for applications that may benefit from a more lightweight approach.
(3) The virtual machine becomes the unit of scalability and failure. This protects the entire system from collapsing giving a hardware fault.
Several essential OS constructs are also modified to fit with Disco:
Virtual physical memory: Each VM is allocated its own set of physical addresses, which are then mapped into machine addresses by Disco. A virtual-to-physical insertion is done first by the OS, and this is emulated by Disco which translates this into the corresponding machine address and inserts the corrected address into the TLB.
Virtual CPUs: Execution on the virtual CPU is emulated by using direct execution on the actual CPU. The registers and PC of the real machine are set to the registers/PC of the virtual machine. Each CPU also has its own process table entry. Traps, page faults, and system calls are trapped to the monitor, which then emulates the effect on the current VM.
Cache-Coherent Nonuniform Memory Architecture (CC-NUMA) memory: Pages are moved depending on use.This is facilitated via FLASH's cache-miss-counting facility. A memmap data structure is used to keep track of each real machine memory page.
Virtual I/O: A Direct Memory Access (DMA) map is included with each device and network interface, containing either the source or destination of the I/O operation. These are intercepted by Disco, which then uses its device drivers to interact directly with the physical device.
Copy-on-write disks: every disk request into memory is intercepted by Disco. The page is mapped read-only into a destination address with the first VM request, and subsequent accesses will simply be mapped.
Virtual Network Interface: COW allows for the sharing of memory resources, but not for communication. To facilitate this, standard distributed protocols such as NFS are used.

4. Evaluation
Four (short) workload types were tested: an OS & I/O intensive workload (pmake), a long-process, mem-intensive workload with little use of OS services (engineering), a shared-mem workload (radix, raytrace), and a memory-intensive workload (database). As compared to running the base operating system (IRIX), Disco with a single VM showed an overhead increase in execution time for all four workloads. This overhead consists of (1) time spent in kernel, (2) time spent on TLB emulation, (3) time spent on instruction emulation, (4) time spent on monitor services, and (5) time spent on handling TLB misses. There is also a memory overhead that increases with each additional VM run, but this is trimmed somewhat by sharing kernel text and the buffer cache.
Disco gets best to shine with comparing the shortened execution time associated with multiple VMs. Partitioning a workload across multiple machines allows for a lower execution time due to reduction in kernel stall time and kernel synchronization. Disco is also relatively easy to port and has a lower execution overhead compared to a pure IRIX environment.

5. Confusion
While the kernel execution time and overall runtime does appear to go down if there are multiple VMs running on a system, is the tradeoff of memory overhead worth it? Also, how would this perform in comparison with operating systems that are commonly run today?

1. summary
Disco is a proposed solution to solve scalability problems with ccNUMA systems. The authors, who are founders of VMWare, propose to add a small layer between hardware and the OS that scales the system to many cores without a large development effort in changing commodity OSes to perform well on ccNUMA machines with many cores.
2. Problem
As the number of cores increase on a system, memory access times become non-uniform, but current operating systems at the time did not have any mechanisms to deal with this problem. They were created for general UMA systems, so when number of cores increased on the systems, the OSes failed to scale properly on these machines. Furthermore, there is a significant development cost to change the OS to support ccNUMA machines with a lot of cores. In fact, hardware innovation suffer greatly since software changes take a long time to support the new hardware.
3. Contributions
As a result, the authors propose adding a small layer of software between commodity OS(es) and hardware to multiplex the cores and other resources to the OS(es) so that they see UMA times, which is something they are good at exploiting. Consequently, Disco is proposed which is very similar to a VMM, thus providing some of the advantages (e.g. having multiple OSes run on the same machine) and disadvantages (e.g. 2D page table walks) of VMM. In addition, Disco implements some distributed systems protocols to allow easy communication between VMs.
The key detail that sets Disco solution apart from others is that it requires no change in the commodity OSes (or very little to optimize it for Disco). As a result, the huge development cost to redesign commodity OSes is avoided. Through experiments, the authors show that the overhead of Disco is no more than 16% which is mostly due to trap emulation for TLB misses. Nevertheless, Disco still performs much better than a commodity OS running natively on a ccNUMA hardware.
4. Evaluation
I feel like Disco is really the work that helped hardware continue to innovate and avoid being held back by software. Because of Disco, the major software development cost to support new hardware innovations is greatly reduced. In fact, Disco’s page migration and replication algorithms gave commodity OSes (almost) a UMA latency, which they are used to.
One of the interesting notes in the paper is about the trick Microsoft used in order to be backwards compatible. Windows 95 acted like a VMM and used virtual machines to run software for older OSes to stay backwards compatible, which I found as a genius idea.
5. Question
On a context switch, could saving TLB entries just like registers make the overall performance better? On context switch, you would spend more time doing memory read/writes, but when a process is restarted you won’t miss in TLB as much. #ProjectIdea?

1. Summary
This article discusses the approach of using virtual memory monitors(VMMs) to tackle the problem of developing scalable system software for shared-memory multiprocessors in detail and contrasts it with the traditional approach of extending commodity operating systems. The article also describes the design, implementation and evaluation of the virtual machine monitor - Disco.

2. Problem
Traditional operating systems were built for systems assumed to provide uniform memory access time to all processor cores. But innovations in hardware leading to scalable shared-memory multiprocessors invalidated these assumptions, now cores needed more time to access some portion of memory than others. This non uniform memory access(NUMA) impacts the scheduling and virtual memory management of traditional operating systems and most of the vendors took the approach of making extensive modifications to the OS to make them NUMA aware. These changes were extensive and touched critical portions of the OS, so the system software for such innovative machines was delivered significantly later than the hardware and was a source of instabilities dwarfing the benefits of innovation in hardware for many application areas which valued reliability.

3. Contributions
The main challenges facing virtual machines are memory and execution overheads, resource management due to lack of fine grained information from the VM and communication between multiple different operating systems running on VMs. The major contribution of this work is coming up with a design and implementation of a VMM - Disco, which reduces the impact of disadvantages of using virtual machines by combining advances in operating and distributed systems technology with some new ideas implemented in the monitor and the guest OS. Apart from emulating the processor architecture Disco extended it further to support efficient abstractions for more commonly used tasks, eg. load and store operations on special addresses for privileged instruction. Disco used dynamic page migration and replication to present a nearly uniform memory access time to the commodity OS, this allowed non-NUMA aware OS to run on NUMA systems. VMMs intercept all communication to and from I/O devices to translate of emulate their operation. But Disco recognized the importance of disks and networking interfaces and provided special abstractions for them. Disco allowed virtual disks with different sharing and persistence models. It provided networking abstractions for VMs to run distributed system protocols among themselves apart from virtualizing the network devices.

4. Evaluation
Disco was evaluated on a simulator instead of the raw hardware because it was not available then. Due to the simulator limitations, authors used small but realistic workloads to evaluate the system. The authors used these workloads to profile the execution and memory overheads for a single virtual machine. The evaluation of the system for these workloads is quite thorough and authors were able to suggest some restructuring of the guest OS to reduce these overheads. The scalability evaluation clearly highlighted the benefits of using Disco for handling intensive, parallel workloads over the commodity OS.

5. Confusion
- Can we discuss the use of pmap and memmap structures in class?
- In scalability analysis we can see that NFS protocol was the bottle neck due to its design, is there any protocol which can do better than NFS in this setting?

1. Summary
This article discusses in detail the approach of using virtual memory monitors(VMMs) to tackle the problem of developing scalable system software for shared-memory multiprocessors and contrasts it with the traditional approach of extending commodity operating systems. The article also describes the design, implementation and evaluation the virtual machine monitor - Disco.

2. Problem
Traditional operating systems were built for systems assumed to provide uniform memory access to all processor cores. But innovations in hardware leading to scalable shared-memory multiprocessors invalidated these assumptions, now cores needed more time to access some portion of memory than others. This non uniform memory access(NUMA) impacts the scheduling and virtual memory management of traditional operating systems and most of the vendors took the approach of making extensive modifications to the OS to make them NUMA aware. These changes were extensive and touched critical portions of the OS, so the system software for such innovative machines was delivered significantly later than the hardware and was a source of instabilities dwarfing the benefits of innovation in hardware for many application areas which valued reliability.

3. Contributions
The main challenges facing virtual machines are memory and execution overheads, resource management due to lack of fine grained information from the VM and communication between multiple different operating systems running on VMs. The major contribution of this work is coming up with a design and implementation of a VMM - Disco, which reduces the impact of disadvantages of using virtual machines by combining advances in operating and distributed systems technology with some new ideas implemented in the monitor and the guest OS. Apart from emulating the processor architecture Disco extended it further to support efficient abstractions for more commonly used tasks, eg. load and store operations on special addresses for privileged instruction. Disco used dynamic page migration and replication to present a nearly uniform memory access time to the commodity OS, this allowed non-NUMA aware OS to run on NUMA systems. VMMs intercept all communication to and from I/O devices to translate of emulate their operation. But Disco recognized the importance of disks and networking interfaces and provided special abstractions for them. Disco allowed virtual disks with different sharing and persistence models. It provided networking abstractions for VMs to run distributed system protocols among themselves apart from virtualizing the network devices.

4. Evaluation
Disco was evaluated on a simulator instead of the raw hardware because it was not available that time. Due to the simulator limitations the authors used small but realistic workloads to evaluate the system. The authors used these workloads to profile the execution and memory overheads for a single virtual machine. The evaluation of the system for these workloads is quite thorough and authors were able to suggest some restructuring of the guest OS to reduce these overheads. The scalability evaluation clearly highlighted the benefits of using Disco for handling intensive, parallel workloads over the commodity OS.

5. Confusion
- Can we discuss the use of pmap and memmap structures in class?
- In scalability analysis we can see that NFS protocol was the bottle neck due to its design, is there any protocol which can do better than NFS in this setting?

Post a comment