CS 736 Reviews - Spring 2016: Memory Resource Management in VMware ESX Server

1. summary
This paper describes the memory management policies and innovations used in VMware ESX Server, a virtual machine monitor that runs multiple unmodified commodity operating systems. Their policies focus on flexibly and dynamically assigning resources for virtual machines and improve upon Disco in reducing the amount of knowledge and interception required about actual OS’s code.
2. Problem
The industry’s need for server consolidation and the benefits of shared-memory multiprocessor hardware calls for the server virtualization techniques, which solves the underutilization problem of multiple individual servers by having guest OS share resources efficiently on larger server, and simplifies the management and reduces production cost.
3. Contributions
The ESX Server allows virtual machines to overcommit memory to make efficient use of the available memory. The innovative approaches of ESX Server’s memory management provides practically efficient to handle different situations under environment where memory over-commitment is allowed. Specifically:
1. ESX Server uses a mix of Ballooning and paging techniques for memory reclamation in different cases where machine memory is under pressure. Paging is the ESX Server’s swapping mechanism which works in case ballooning cannot effectively relieve memory pressure. Ballooning is a clever technique which push pressure on OS’s “physical” memory by ”inflating” the balloon module, a pseudo-device driver, which asks for lots of memory from the OS, leading it to try freeing up space as much as it can. Then the driver may “deflate” when the pressure gets better.
2. ESX Server also exploits some common techniques in saving memory usage and copy overheads like Copy-on-Write. They face the problem that the code of guest OS should not be touched, so they used page content comparison to determine share possibility, and tricks like hashing on content to relieve the comparison overhead.
3. Their sharing technique brings new parameters that measure relative importance of VMs. Then they may use algorithms to determine the priority when memory needs to be reclaimed.
4. Evaluation
Their experiment proved that their memory management approaches are efficient in industry use. I was surprised to see that several of them performed really well practically, including the fact that balloon added very minimal overhead, and the hashing techniques almost uniquely identify the pages without causing conflicts, even when the whole page content is hashed into only 8 bytes.
Their sharing policy also brings a great step forward in practical use of VMM in Cloud Computing and Service Provider than Disco in that it allows flexibility in resource assignment according to the relative importance of different VMs, which can be very important in the industry’s business model.
5. Confusion
How the double paging problem as is described in the paper solved?

Posted by: Fujie Zhan | February 4, 2016 09:00 AM

1. Summary
This paper introduced several innovations in memory management policies and mechanisms used in the VMWare ESX Server. The ESX server is a bare metal hypervisor, ie, it manages system hardware directly, as opposed to hosted hypervisors. The paper discusses ideas for memory sharing, memory reclamation, better resource allocation, I/O page remapping, and discusses policies for admission control, and dynamic reallocation.

2. Problem
• Commodity OS es do not support dynamic changes to physical memory size.
• Over-commitment of memory and other resources maximizes utilization via statistical multiplexing.
• When memory is over-committed, reclamation from the VMs is necessary. Meta-level page replacement - the hypervisor's decisions are largely uninformed. The double paging problem also occurs.
• Transparent page sharing previously required guest OS modifications.
• In existing proportional-share resource allocation, idle clients with many shares can unproductively hoard memory at the expense of memory-starved lower share clients.

3. Contributions
Memory reclamation - Ballooning backed up by Demand Paging
• The guest OS is coaxed into reclaiming or paging out pages, or, into loading more pages by artificially influencing the number of pages it uses.
• A balloon module is loaded into the guest OS as a driver or kernel service.
• It inflates or deflates the 'memory used idle' by the guest OS by allocating/de-allocating pinned physical pages within the VM. The pmap entries corresponding to these are annotated so that actual memory usage is avoided.
Content Based Transparent Page sharing
• Hashing is used to identify pages with potentially identical contents.
• In case of exact content match, the Copy-on-write solution is used for sharing.
• Higher level page sharing policies control when where to scan for copies.
Idle Memory Tax
• Pages will be reclaimed from clients who are not using their full allocations.
• statistical sampling is used to estimate actively used memory fraction.
Allocation Policies
• Efficient policies for Admission Control, Dynamic Reallocation are discussed.
I/O Remapping

4. Evaluation
• Ballooning depends upon guest OS cooperation.
• Content based page matching causes CPU overhead, which can be mitigated by smart higher level policies.
• Ballooning - Dbench benchmark - overhead of 1.4% to 4.4%, mostly due to the varying size used by the linux kernel.
• Page Sharing - best case - identical VMs - 67% of all virtual memory was shared. Throughput difference negligible.
• Page sharing - real world - 7% to 33% of all virtual memory was shared, with zero-pages providing a sizeable contribution.
• Idle Memory Tax - 30% throughput increase in experimental setting.

5. Confusion
Memory sampling needs careful usage.

Posted by: Adithya Bhat | February 4, 2016 08:56 AM

Summary:

VMWare ESX is a thin software layer designed to multiplex hardware resources efficiently among virtual machines. The current system virtualizes the Intel IA-32 architecture. It has several novel mechanisms to manage memory. A ballooning technique is used to reclaim memory, a new content based page sharing introduced to reduce overall memory pressure on the system, a statistical sampling apporach to measure the idle memory and to prevent idle clients from memory hoarding a mechanism called "idle memory tax" is implemented.

Evaluating CPU overhead for more workloads would have probable given insights of possible performance impacts of page sharing and reclamation.

Problem:

Recent Industry trends, such as server consolidation and proliferation of inexpensive shared memory multiprocessors have fueled resurgence of interest in server virtualization. The need to run existing operating systems without modification presented a number of challenges. One of the challenges was to efficiently utilize the memory across all VMs The standard approach used by earlier virtual systems is to introduce another level of paging to swap an area on disk. This requires a meta level page replacement policy. This policy is likely to introduce performance anomalies due to intended interactions with native memory management policies in guest OS. This paper targets to solve this problem by introducing novel mechanisms to recognize target memory allocation to each VM , techniques to reclaim memory from VMs and explore opportunities to reduce overall memory pressure of the system.

Contributions:

1) Balooning technique to make the guest OS decide which particular page to reclaim helps get rid of the performance anomalies due to interactions between VMMs memory management policy and native guest OS memory management policy.

2) Content based sharing to reduce the memory pressure.

3) A new statisctical sampling approach to obtain aggregate VM working set estimates directly without any guest OS involvement. This technique correctly measures the amount of memory being actively used by the OS.

4) After calculating the amount of memory being used using the sampling approach (inturn the idle memory also ) ESX implement the idle memory tax to determing how much memory to reclaim without impacting its performance.

Evaluation:

To evaluate the ESX server page sharing implementation experiments a series of experiments were performed using identically configured virtual machines each running Red Hat Linux 7.2 with 40MB of "physical" memory. Each experiment consisted of between one and ten concurrent VMMs running SPEC95 benchmark for thirty minutes. ESX server was running on Dell Power Edge 1400 SC multiprocessor with 933 MHz Pentium 3 CPUs. These experiments clearly showed the effectiveness of memory sharing with nearly 5MB of memory reclamation from a single VM due to shared copies of zero pages. After initital jump of sharing between first 2 VMs total amount of memory shared increases linearly. This indicates that most sharing is due to redundant code and read only data. They also show that this sharing and reclamation does not cost any CPU overhead , infact increases throughput by 0.5% as page sharing increases hit rates due to increased memory locality. Page sharing metrics are also calculated for different production deployement of ESX server but there CPU overhead has not been calculated for each them. Overall, the approach is correct in my opinion.

To evaluate effectiveness of memory sampling technique ESX server was run on dual processor Dell Precision 420 configured to execute Windows 2000 Advanced Server on a single 800MHZ Pentium 3 CPU. A toucher application is run , and as expected the statistical technique was able to capture the actual trends except for some spikes due to indows "zero thread" runs.

Similarly, effectiveness of imposing tax on idle memory has also been proven to give performance improvement hen ESX was run on Dell Precision 420 multiprocesor with two 800MHZ Pentium 3 CPUs and 512MB RAM. To test it First VM remains idle after booting , a few mins later second VM runs memory intensive bench , when tax rate is increased to 0.75 workload benefits significantly.

Confusion:

I would like the instructor to discuss some draw backs in these policies , if any. To me they seem novel and effective.

Posted by: Vishakha Dhelia | February 4, 2016 08:55 AM

summary~
This paper presents the mechanisms and policies for memory resources management in the VMware ESX server. The ESX server try to improve server consolidation through new techniques and algorithm for allocating memory across virtual machines running unmodified commodity OS.

problem~
It is common that individuals servers are underutilized in many computing environments. So consolidate them on a single physical server will improve efficiency. To really utilize the resources, it requires the mechanisms and policies be able to dynamically allocate the resources rather than the approaches of static allocation. and this should be done in the way that performance penalty is minimized and no guest OS modifications are required.

contribution~
To achieve a higher degree of consolidation, the ESX server supports memory overcommitment. The memory reclamation mechanism will be employed when the memory is overcommitted. ESX server comes up with the technique of ballooning to effectively reclaim memory from the guest OS by adapting the memory pressure on the guest OS.
Memory sharing is also an important technique that reduce the memory footprint, since it is the common case that there will be multiple instance of the same guest OS on a virtual machine. ESX server identifies redundant copies of pages such as code or read-only data across the virtual machine by hashing the contents of the pages. Once the common pages are identified, copy on write will be employed to eliminate the redundant copies.
ESX server also introduced the idea of Shares that helps to allocate the resources dynamically based on importance thus enabling performance isolations and efficient memory utilization.
The impressive point for me is that unlike the approaches proposed in other system like Disco, ESX server implements these mechanisms on the virtual machine monitor level, with the help of devices driver that send information about the guest OS to hint the monitor to make a better decision. The important point is that It doesn’t require modifications on the OS which ensures compatibility and reliability.

evaluation~
The evaluations are carried out mechanism by mechanism throughout the paper. For each mechanism proposed, gains and overheads were analyzed, and they clearly validate the effectiveness of those mechanisms.
But for the ballooning part, the mechanism might introduces some performance overheads to the system due to the aggressive swap operations to the disk in case of high memory pressure, but these potential overheads were not evaluated in the corresponding part of the paper.

confusion~
How this system solves the problem of double paging?

Posted by: Yudong Sun | February 4, 2016 08:48 AM

1. Summary
The authors in this paper have introduced a software layer that owns the hardware resources completely and uses server consolidation to highly overcommit the system. This cleanly encapsulates user applications and OS services while virtualizing the hardware. The ESX server maintains the balance between performance isolation and efficient memory utilization.

2. Problem
The earlier issue in the development of system software for scalable shared-memory multiprocessors solution DISCO was that it involved minor tweaks in the guest operating system. ESX server(type-1 VMM) requires absolutely no change in the OS. Also, the memory management in the system across all the virtual machines was not efficient and involved swapping of "physical" pages to disk to reclaim memory in DISCO. Whereas here, the authors have an elaborate state driven memory reclamation algorithm in ESX server implementation.

3. Contributions
Ballooning and swapping are two mechanisms for system reclamation. Content-based page sharing uses reference count in the hash entry in the shared frame to keep track of all the virtual machines that share the COW page. Share-based allocation and idle memory tax along with memory sampling provides a fair algorithm that uses the shares-per-page ratio as a price in the revocation of memory with an increase in throughput. I think the "share before swap" is a good technique to ensure OS that over-commit prior to the start of balloon drivers. The I/O page remapping tracks hot pages and accordingly shifts them in low memory while remapping low to high for not frequently page access.

4. Evaluations
The best part about this paper was that all the claims were properly backed up with experiments that displayed the working of their idea on every layer spanning across from ballooning impact on throughput on differently sized VMs to dynamic rellocation across 5 VMs with various workloads that validate memory management in different times. In my opinion, it would have helped to compare the performance in the case of various % overcommitted memory to understand the system behavior on a wider perspective.

5. Questions
How does the shared page give all the guest virtual addresses(or PPN) that are referenced to the COW page since there is no such entry in the shared frame.

Posted by: Sejal Chauhan | February 4, 2016 08:46 AM

Summary:
This paper deals with memory management techniques in the VMware ESX Server, a layer residing on system hardware running unmodified commodity operating systems. VM workloads that overcommit memory are managed by three techniques: Firstly, ballooning where least valuable pages as considered by guest OS are reclaimed. Secondly, idle memory tax to utilize memory efficiently while maintaining performance isolation and thirdly, content based page sharing and I/O remapping to leverage transparent page remapping to eliminate redundancy and reduce copy overheads. These methods are evaluated using a range of workloads and hardwares and the results suggest the practical usability of these methods.

Problems:
Earlier VMMs modified the guest OS running on a VM to run commodity OS, for example Disco. Memory management was ineffective giving arise to problems such as demand paging, question of how to enhance VM performance that has overcommitted memory, decision to choose a guest OS and which of its pages to reclaim, and redundancy due to mapping of multiple pages one in each VM’s memory when VMs try to access same page. These were the issues addressed by VMware ESX’s high-level resource management policies.

Contributions:
-Memory is virtualized by additional address level translation that uses pmap to do PPN-toMPN mapping(physical page number to machine page number)
-Ballooning: to reclaim least valuable pages from guest OS when memory is overcommitted, balloon driver resides in the guest OS and interacts with ESX server, it communicates the page number the guest OS wants to reclaim, ESX server reclaims the matching machine page.
-ESX server falls back on paging(page swapping) when ballooning is not supported/possible.
-Content based sharing: page contents are summarized by a hash value that is used to identify shared page copies, a single global hash table is implemented for all scanned pages and Copy-on-Write technique similar to Disco is while sharing read mode pages.
-Memory allocations are dynamic and use a proportional-share policy.
-Idle memory tax to penalize VMs with higher idle pages: this specifies the maximum fraction of idle pages that can be reclaimed from guest OS, where idleness is measured by sampling.
-I/O remapping to alleviate redundancy and overhead due to copying of pages during I/O operations.

Evaluation:
The authors evaluate the ESX server across a range of hardware, guest operating systems and workloads. They do a good job in presenting the empirical evaluations and comparisons with other virtualized servers.

-File server benchmarks demonstrates performance similarity with ballooned VMs and non ballooned VMs with an acceptable overhead.
-Content based page sharing shows an advantage of reclamation of 60% allocated memory in best case scenario and 7-33% in real-world scenarios.
-Idle memory tax shows an improvement of throughput by 30% in a 2 VM system.

The authors could have compared performance of workloads on virtualized platforms in the presence and absence of memory overcommitment, which would have provided an insight into the performance reduction due to overcommitment.

Confusion:
1)Did not understand the active memory sampling mechanism and the justification of using the maximum of slow moving average , fast moving average and modified fast average to measure idle memory.
2)Concept of I/O remapping is not clear.

Posted by: Shruthi Racha | February 4, 2016 08:43 AM

1. Summary
The paper aims to solve/improve memory management issues in server systems. Virtual machines are a popular method of allocating server resources to client and managing the resources allocating to these virtual machines is a current day problem. In this paper, VMware tackles this issue by designing a thin software layer below the virtual machines (and above the hardware) to allocate server resources. This paper focuses on memory resource allocation alone and introduces novel techniques in memory reclamation, sharing of memory and idle-memory aware allocation to allow overcommitment (both commercially and technically important) and effectively distribute memory resources.

2. Problem
Current methods to manage memory distribution to virtual machines are both complex and inefficient, require OS modifications and do not scale very well, the proposed solution is simple, efficient, does not modify the OS and is expected to scale. Three major issues which are tackled in the paper -

First, earlier memory reclamation mechanisms perform meta-level page replacement controlled by the VMM which not only can be poor in terms of less awareness in page selection (to reclaim) but can lead to other overheads like double paging. Communication with OS to improve this, on the other hand, would require tinkering with the guest operating system.

Second, earlier mechanisms to share pages in memory is too complex and/or require OS modifications, such as transparent page sharing (introduced by Disco) which has significant overheads.

Third, originally, allocation of resources was only by priorities / shares. Every client is allocated at least their minimum requirement and at most their max requirement, and further allocation of available resources is in proportion to the number of shares alone. This method can be very inefficient as many VMs end up holding idle resource.

3. Contributions

In order for the resource management of memory mechanism to be commercially effective, the server needs to allow overcommitment of memory. The ESX server allocates as much resources as possible to its client and used effective reclamation technique to get back physical memory from one VM to give it to a different VM. It uses a ballooning technique, which introduces a pseudo-device into each guest OS which makes some fake memory requirement, as controlled by the VMM. This technique seems highly suitable since the OS allocates the required memory to the device and therefore has only the lesser physical memory portion to allocate to its processes - hence not only is the physical memory portion to the VM reduced as necessary, this is achieved without any OS modification while still keeping the OS aware of the changes so that the remaining portion is still allocated in an efficient manner (as if the memory was of smaller size only).

In order to shared memory between VMs, the paper proposes content based sharing which is less complex compared to transparent sharing because, the latter requires significant OS modifications to identify redundant pages. This instead uses a hash map to hash page contents and builds a comparison mechanism to check if pages are the same, and implements this atop the COW policy with optimizations for single sharer scenario. The mechanism seems efficient but it is unclear if the overheads of the hash table and comparing pages etc. are really reasonable and scalable.

The paper introduces the idea of idle memory tax, to improve resource allocation to VMs. The mechanism tracks the idle memory in each VM by a sampling mechanism which tracks the number of page table accesses and then imposes a penalty for higher idleness, reducing resources from more idle systems and providing the same to more impoverished systems. This method is intuitive and it is surprising that it was not implemented earlier, since naive resource allocation was obviously going to be inefficient. The one foreseeable problem is that the the sampling mechanism needs to be regular to be efficient and increase its occurrence can pose harsher performance penalties.

4. Evaluation
Overall, the evaluation ‘types’ are well received - each scheme is evaluated lucidly against well defined baselines, but the lack of benchmarking and the absence of some trade-off explorations are glaring. None of the evaluations shown are tested across diverse benchmarks. The authors use many real-world examples to highlight their work, which is good, but there are only one/two use-cases shown. Some parameters such as the sampling rate in idle memory measurement, variations in shared memory between threads etc. which can have a significant impact are not evaluated.

5. Confusion
Can shadow page tables be used for a s/w managed TLB? As a follow up, how would Disco interact with a h/w TLB?

In content-based page sharing, what happens if new pages have same content as a page whose hint is stale (hence hint mismatch, though pages match)? Also, if the possibility of false matches are so low in the hash, then is chaining even necessary?

Is content based sharing scalable in terms of h/w and perf overheads?

Posted by: Gokul Subramanian Ravi | February 4, 2016 08:43 AM

Summary
Through the use of specialized "balloon" drivers, VMWare ESX can offload intra-vm paging decisons to the guest OS's policies. By combining this with hash-based page sharing policy and a set of share-based statistical allocation policies, ESX efficiently virtualizes unmodified systems on overcommitted hardware.

Problem
Commercial virtualization poses a variety of challenges; server hardware is expensive, applications may require difficult or impossible to modify commodity operating systems, and server operators have to maintain quality of servi\
ce obligations to customers. To ensure maximum usage of hardware, operators may overcommit resources, running more guests than the hardware is actually capable of running. However, naive resource management strategies may run afo\
ul of customer needs, by degrading the performance of a VM subject to a QoS guarantee. Moreover, the clever sharing policies and other optimizations provided by e.g. Disco, are not compatible with off-the-shelf software. Additionally, hypervisor policies that are ignorant of the internal state of the guest OS may interact pathologically with the guest system, needlessly degrading performance.

Contributions
ESX uses a specialized guest OS "balloon driver" which the hypervisor can instruct to "wire down" physical pages, thereby excluding them from consideration by the OS's paging algorithms. By inducing intra-vm memory scarcity, the rest of the policy decision is offloaded to the guest OS's policy. This is a clever application of the broader application of "Grey Box" techniques to virtualization; the hypervisor can "request" behavior from an unmodified guest OS by externally creating a VM state that induces the behavior.

Instead of adding hooks to memory manipulation syscalls (as Disco did), ESX efficiently shares "data-identical" pages via a more general hash-table based data structure populated by periodic random scans of pages. To enforice the prioritization of VMs, ESX uses a share-based policy which assigns weights to different VMs, allowing machines with memory usage over-extended relative to the share of the hardware to have their pages evicted first, and for idle pages to be reclaimed via a "tax". In the case of page sharing and the memory tax, VMWare introduced a randomized background sampling - a small number of random pages are tested periodically for the desired property. By doing this, expensive tasks, such as checking for idle pages by forcing expensive TLB misses, can be amortized to a smaller and uniform background cost.

Measurement
The metrics used to measure ESX's performance show that these optimizations increase efficiency. The performance of the balloon mechanism, the first test of page sharing and, idle memory reclamation are all measured with batch-style synthetic benchmarks. While the synthetic page sharing results are striking (60% shared across VMs and a slight performance increase), they represent the best-case situation of identical systems running the same task. Similarly, the idle reclamation test demonstrates that policy and mechanism work, but the quantitative results correspond to the fairly simple cases, like one idle OS and a second running a batch job. In general, the performance metrics are over-simplistic - some provide comprehensive numbers for memory usage, some for performance, but none discuss both on the same workload. Moreover, with the exception of the idle memory tax and the dynamic reallocation, it's hard to get a sense of the characteristics of the workloads or how representative they are. Moreover, as with Disco, there is no meaningful discussion of interactive performance and latency.

Confusion
I'm curious about the performance anomalies that occur when VM policies interact with the guest. Besides "double-paging" are there other intersting classes of degenerate behavior?

Posted by: Michael Vaughn | February 4, 2016 08:38 AM

1. Summary
The paper introduces novel mechanisms and policies used by VMware ESX server to manage memory effectively among multiple virtual machines running unmodified commodity operating systems. These techniques and policies allow VMware ESX server to overprovision resources to VMs effectively to facilitate higher degree of server consolidation.

2. Problem
As ESX server overcommits memory, there must be a mechanism to reclaim memory from one or more virtual machines. This introduces another level of paging but the VMM does not know which pages to flush out as this information is best known to OS and a meta level page replacement algorithm in VMM will cause performance degradation as it might clash with the guest OS’s memory management policies and might sometimes lead to double paging, where VMM chooses a page and swaps the page and the OS later on chooses to swap out the same page. There is a need to exploit sharing effectively to accommodate higher levels of memory overcommitment so as to accommodate more VMs and at the same time be able to provide QoS guarantees to clients of varying importance.

3. Contributions
ESX server uses a number of novel low-level mechanisms guided by a higher level policies to provide effective memory management to accommodate overcommitment of memory without making any changes to the operating system as opposed to Disco which required a number of minor modifications to the OS.

- ESX server uses a novel mechanism called ballooning that makes the OS to swap out pages using its own page replacement techniques, by coaxing it to think that there is memory pressure. The balloon driver loaded into the OS asks for pinned pages in memory and the driver passes the information on these pages to VMM which uses the corresponding machine pages for another OS.

- It uses a content based sharing mechanism that allows effective sharing of pages across OS and finds new opportunities for sharing than that was exploited in DIsco which identified redundant copies when they were created. (Disco also required modifications to the OS).

- It uses a novel concept of idle memory tax that allows ESX server take pages from idle clients which is computed by statistically estimating the working set via sampling.

VMware ESX server has a clear policy/mechanism separation which is even more welcoming in virtualized environments where VMs of varying importance are accommodated. The low level mechanisms described above are guided by higher level policies (which I believe must be flexible and configurable). For example, when to use ballooning to reclaim pages is guided by a dynamic reallocation policy. When to scan pages for copies to enable sharing is again determined by a separate policy. Apart from these policies, VMM uses some admission control policies that keeps a check on sufficient resources needed for things to work.

4. Evaluation
The experiments are done using real workloads running on unmodified OS’es running on top of the ESX server and hence more convincing than a simulation. The effectiveness of each of the mechanisms like ballooning, sharing, idle memory tax are evaluated by running different benchmarks and they also present results from production workloads to demonstrate effective sharing.

5.Confusion
I didn’t understand the I/O page remapping. How is the balloon driver initially loaded into the OS?

Posted by: Aishwarya Ganesan | February 4, 2016 08:33 AM

1. Summary

This paper talks about various memory management policies and mechanisms employed by VMware ESX server to allocate and manage memory between different virtual machines running unmodified commodity operating systems.

2. Problem

The Disco paper had shown the various benefits of using a virtual machine monitor(VMM) to multiplex hardware resources among virtual machines(VM) but it required minor modifications in the kernel. VMware was looking to achieve the same benefits of using VMM, except that VMware had to deal with different virtual machines running proprietary operating systems which cannot be modified. Hence, the key problem addressed in the paper is- how to efficiently allocate and manage memory among virtual machines without modifying the guest OS.

3. Contributions

The paper defines several novel memory management policies and mechanisms for VMware ESX server. A ballooning technique is introduced that lets the underlying server (the VMM) to reclaim pages from the guest OS, without making any modification to the guest OS kernel code. Whenever the server has overcommitted the memory and is in need of pages, it inflates the “balloon” within the guest OS that causes the guest OS to use its own paging mechanism to relinquish pages to the balloon driver (which are then given to the server). The idea of transparent content-page based page sharing is carried forward from Disco, by improving upon the fact that now such sharing does not require any guest OS involvement.

Another major contribution of this paper is the idea of “idle memory tax” to penalize idle VMs and reassign their resources to achieve efficient memory utilization. And this is done while still maintaining proportional memory sharing between different VMs based on their relative importance. Using statistical sampling to track active page usage and predict dynamic allocation is a clever mechanism that again does not involve guest OS intervention.

Overall, I think the paper demonstrates a good number of ideas that can be implemented in practical systems and I guess the major advantage is that much of these mechanisms can be tuned dynamically to cater to a wide variety of workloads.

4. Evaluation

The ballooning technique is evaluated using a synthetic dbench benchmark for one VM running over ESX server and the results show that the performance of a ballooned VM is still comparable to a VM of same memory size. The content-based page sharing is evaluated by determining how much memory is reclaimed when running identical copies of OS for different VMs. A five VM configuration running Exchange benchmark, Citrix MetaFrame benchmark and SQL server database workloads are used to evaluate dynamic reallocation of memory between different VMs by using statistical sampling for idle memory reclamation and ballooning & page sharing for efficient memory utilization.

I think it would have presented more insights if the paper also talked about how varying rates of statistical sampling to determine memory idleness affects dynamic reallocation of memory between VMs. But in all, the paper does a good job in evaluating the various novel ideas presented in the paper, and this is partly because it has already proven to run production workloads for VMware.

5. Confusion

Some details around I/O page remapping around “high” and “low” memory is somewhat not clear.

Posted by: Saket Saurabh | February 4, 2016 08:26 AM

1. Summary
This paper describes the innovative memory management techniques implemented in VMware ESX Server. These allow a single computer to be used to efficiently run VMs that overcommit its memory.

2. Problem
Corporations were increasingly consolidating their servers so that they ran many VMs on one physical machine. This was done in part to better utilize the physical computing resources that they had, which were typically underutilized with one server per machine. While Disco had implemented an efficient virtual machine monitor for this situation, an operating system still had to be modified in order to run efficiently on it.

3. Contributions
This paper contributes several techniques to improve memory sharing between unmodified guest operating systems. To improve page sharing, the monitor computes a hash for all scanned pages, which are used in hash table to quickly look up other pages with similar content. Second, it uses statistical sampling to determine whether pages allocated to a VM are idle. The VM’s requests for memory are then penalized based on the number of its idle pages. Third, the authors implement pseudo-device drivers or kernel services (that must be installed on the guest OS) to perform “ballooning”: the driver claims more memory of the guest when directed to do so by the virtual machine monitor, when it needs to reclaim memory. This allows the monitor to take advantage of (rather than working across) clever page replacement algorithms in the guest OS, without actually modifying the OS.

4. Evaluation
The paper evaluates these contributions both on synthetic benchmarks and on tests based on actual deployments of their software. For page sharing, they evaluate a single machine running multiple VMs running the same benchmarks. While extreme, often VMs running on the same machine do run very similar software. They also evaluate page sharing using a deployment like one of their customers’. The paper evaluates the reclamation of idle memory using Windows and Linux VMs running benchmarking applications. These test extreme usage, but not typical usage. The paper only evaluates “ballooning” with a single VM, which seems lacking since most computers would need multiple VMs in order to stress their memory.

5. Confusion
I would appreciate more explanation of the “high” I/O memory that the monitor can migrate to faster memory lower in the address space.

Posted by: Stephen Lee | February 4, 2016 07:53 AM

1. Summary
The VMware ESX server is a thin layer of software layer designed to multiplex hardware efficiently between virtual machines running commodity OS. The paper innovative ideas and policies of memory management used in ESX server for reducing overheads. ESX server manages system hardware directly, providing higher IO performance and complete control over resource management.

2. Problem
The requirement of running existing operating systems without modification was the main motivation behind the design of ESX server. The DISCO prototype which was designed to run unmodified OS resorted to modifications in IRIX kernels.The problem is of the ability of VMMs to flexibly overcommit memory, processor and other resources while still providing resource guarantee to VMs of varying importance. This is done by introducing efficient and flexible memory management techniques in ESX.

3. Contributions
Ballooning technique is one of the main contribution of the paper. The balloon module is loaded into the guest OS and communicates with ESX through private channel. The guest OS decides which particular pages to to reclaim and if required pages them out to its virtual disk. Another contribution of ESX in memory management is the content based page sharing. Pages with identical content are shared. This is implemented by hashing pages based on the content and marking the shared pages Copy on write. The next technique introduced in ESX is reclamation of idle memory from VMs not using their full share of memory. This is measured by introducing a metric called idle memory task, which charges a client more for an idle page as compared to an active page. The implementation and evaluation of these ideas are presented in the paper along with the high-level policies like dynamic reallocation adopted to use these techniques.

4. Evaluation
The paper provides specific evaluation results for all the memory management techniques introduced. The effectiveness of the ballooning technique is presented by running a dbench benchmark with and without the ballooning the technique. The evaluation for content based sharing show results ranging from 7-33% of memory reclaimed by sharing for real world workloads. The idle memory taxing is evaluated by running 2 VMs with same share allocations running different workloads. One VM boots and remains idle and the other VM runs memory intensive workload. The experiment shows reclamation of idle memory from VM1 gets used by VM2. The performance evaluation of dynamic reallocation is done similarly. The author has not presented any evaluation for the case of overcommitment of memory, using the memory management techniques introduced.

5. Confusion
How does IO Page remapping work?
Comparison between the performance of ballooning and demand paging technique. Which is more efficient?

Posted by: Anshul Purohit | February 4, 2016 07:41 AM

1. Summary:
This paper discusses about VMware's ESX server memory management policies and introduces its several novel mechanisms. ESX server multiplexes hardware resources among virtual machines and uses the techniques to support virtual machine workloads.

2. Problem:
To allow efficient multiplexing of resources, system should be able to overcommit resources like memory and reclaim it back from OS when not needed. The management of memory is cruicial to overall system performance. Earlier attempts in VMM prototypes like Disco required modifications to kernel sources of the guest OS which is not easy and presents a number of challenegs. Thus there is a need to support cleaner methodology for memory allocation and management, reclamation and sharing among VMs which the author present in their ESX sever model.

3. Contribution:
The key notions proposed in the paper in this context are:
Ballooning a module loaded in guest OS as a pseudodriver which could be used by server to reclaim back memory from the guest OS for external VMM page needs. It also supports deflate
Content Based Page Sharing,a page is marked as copy on write based on a match found from the hash computed using the contents of it.
Idle memory tax where the idea is to charge client more for an idle page that are not actively using their full allocations , idle clients with many shares and when memory is scare a preference could be made.
Allocation policies, ensures a VM is allowed to power on if the min and over head requirement of memory is met.system administrators can use parameters min, max and overhead memory.
I/O Page remapping, to move pages from higher memory pages to lower memory pages if the number of references to a page exceeds a threshold.
The authors mention that when ballooning is not feasible the server switches back to the old system of demand paging.
Another contribution of ESX Server is that it exploits sharing opportunities such that workloads running in VMs in a single machine often would consume less memory than would on separate physical machines.

4. Evaluation:
The authors have run experiments using real workloads to provide a justification for the notions and policies introduced. To prove the case of low overhead of ballooning they used dbench benchmark. Using their design on SPEC95 and the graph they show with increased VMs having pages in common chances of sharing increases. I think few more results could be produced to see the utilization when overcommiting memory and when it is scarce using the proposed mechanism which could answer if at all there is a trading of utilization with worst case performance. Also there is no evaluation to prove I/O page remapping.

5. Confusion:
Since Balloon driver is implemented for the guest operating system, it is unclear what will happen if the driver is unavailable for one. Also which VM would be charged for shared pages and how are shares allocated ?

Posted by: Ankur Srivastava | February 4, 2016 04:51 AM

Summary:
This paper gives details regarding the various memory management techniques used by VMware ESX Server, which is a bare-metal hypervisor, to support virtual virtual workloads that overcommit memory. The authors describe various innovative mechanisms like ballooning, idle memory tax, content-based sharing and hot page re-mapping. The authors have done an extensive evaluation of these mechanisms to prove that these mechanisms are do not impact the overall performance in a negative manner.

Problem:
Server consolidation via overcommitment of memory, processors and other resources ensure efficient usage of the underlying machines. However, most of the solutions prior to this paper require modification of the guest OS to support various mechanisms. This may not be a feasible approach all the time due to the inherent complexity that is involved in OS modification. Another problem that crops up due to meta-level page replacement is double paging due to the fact that paging is transparent to the OS. Lastly, current memory management mechanisms do not take into account the fact that different VMs may have different priorities and QoS requirements.

Contributions:
ESX server achieves dynamic memory allocation-deallocation via the use of ballooning. This involves inserting a module into every guest OS through which the server can communicate with the guest OS to inflate or deflate. Through this innovative mechanism the problem of double paging is solved as the guest OS is no longer transparent to paging as the guest OS decides which pages are to be reclaimed. The ESX server also supports transparent page sharing via the use of content-based based page sharing to identify identical pages. The advantage of this approach is that this mechanism does not require any modification of the guest OS. The server also employs a share-based allocation strategy where VMs with higher share is given more priority as compared to VMs at lower priority. However, this strategy, if used as is, may not lead to efficient usage of memory if a particular VM of higher share claims for memory but does not use it. In order to solve this issue, the strategy takes into account the working set of the VM also into account to ensure that the goals of performance isolation and efficient memory usage are met simultaneously. The basis of this idea is to charge a user more for an idle page via the mechanism of idle memory tax. For the idle memory tax to be effective, the server uses sampling techniques to estimate the aggregate VM working set without the involvement of the guest OS. The extra level of redirection in ESX enables the transfer of a page from high memory to low memory transparently.

Evaluation:
The authors evaluate the performance of the various mechanisms individually. With respect to ballooning, they are able to prove that there is a low overhead using the dbench benchmark. Next, via the SPEC95 benchmark they show that there is a high percentage of memory sharing and reclaiming. Eventhough there has been extensive evaluation carried out by the authors, I feel they fall short at several places. Firstly, it would ideal to compare the performance of their sharing mechanism when there is a lot more heterogeneity with respect to the VMs. Also, comparisons with existing standalone systems would be beneficial. Lastly, evaluation of various sampling strategies would give us more insights into why they stuck to the one mentioned in the paper.

Confusion:
I am not quite convinced as to how effective the sampling strategy mentioned in the paper would be. Why should one take the max out of the 3 averages? Also, how exactly are the 4 different reclamation thresholds chosen?

Posted by: Arjun Singhvi | February 4, 2016 04:20 AM

Summary
The Paper present VMWare ESX Server which is basically is Virtual Memory Monitor and present different Memory managing mechanisms used by it such as Ballooning,Idle Memory Taxing , Content Based sharing using Hash tables . The paper also presents evaluation of each of these techniques.
2. Problem
In computer industry individual servers seems to be under utilized . So we use Virtual Machines instead and is used in server virtualization where multiple VM's are given illusion they have a dedicated machine . But the system should allow the administrator to efficiently allocate memory and other resources as per the importance of the VM. The Author specifies different memory management techniques which handle this requirements.
3. Contributions
The paper provides many different ways to manage memory efficiently for VM's
Ballooning Technique: This is a great technique where ESX server can make the guest OS to release some pages or reallocate by using the concept of inflate/deflate using a psuedo driver. This allows ESX server to handle overcommited memory situation

Content Based Page Sharing:Helps in identifying redundant pages faster using hash tables and hint frames.This technique overcomes the a drawback of DISCO OS where host OS modification was required to detect redundant pages.

Share Based allocation: Pages are allocated based on the share of the VM. This seems to be a pretty intriguing technique which can be used in corporate world such as AWS.

Idle Memory Tax: Involves the concepts of stastics such as sampling , fast average , slow average which helps is using idle pages from a VM's and also allowing it ramp up to its share based allocation quickly when desired withput any delay.

4. Evaluation
The paper presents the evaluation of each of the concepts as they are discussed is clear manner. The Balloon peformance which is a great technique doesnt invlove much overhead . The paper clearly shows what they achieved i.e., increased in VM's Memory by sharing and throughput by Idle Tax.
Missing Evaluations:
For the content based sharing the paper only shows the graph of how much memory was gained but graph showing the whether the performance remained same or not with hash value calculations, comparision of pages. .Paper doesn't present evaluation of Page sharing when differnt VM's such are Linux , Windows are running together.
No decrease in performance on sampling is also not shown.

5. Confusion
Fast Average , Slow Average how is it calculated ?
More details on how exactly I/O Page Remapping Done.

Posted by: Mushahid Alam | February 4, 2016 03:52 AM

Summary

This paper discusses about the vmware ESX Server that effectively multiplexes a multiprocessor shared memory hardware (just as in the Disco paper) among multiple VMs running commodity operating systems on them. It also discusses about the optimizations made by vmware apart from the concepts introduced in the Disco paper (page replication/migration, copy on write and sharing of pages among VMs, memory management using pmap, shadow page tables) such as ballooning, idle time estimation and reclaiming idle pages based on tax rate, share based allocation and finally benchmarks it against the existing resource management techniques.

Problem:

When multiple VMs run on a single hardware, in the absence of the resource management techniques/optimizations used in the paper, memory is usually under-utilized in most common cases. More over as the paper suggests that modifying the guest OS to enhance resource utilization is less preferred, building an efficient software that performs memory management by over-commitment of memory and using efficient page sharing/allocation/deallocation helps overcome the above mentioned disadvantages. Since this is a commercial product, an effective mechanism to charge users and provide performance to customers based on SLAs is also required, which lead to innovative proportional sharing mechanisms.

Contribution:

This paper introduces a few key concepts. A device driver in the guest OS which interacts with the esx server and assists in inflation/deflation of memory allocated to each of the VMs for efficient resource management through a concept called ballooning. The concept of idle memory tax on VMs was introduced to solve an open problem in share based management for better memory management (those who deserve get more resources and those who don’t need and hoarders get fewer resources). A content based page sharing using hashing was introduced to limit CPU overhead during page sharing and copy on write.

I feel some of the concepts introduced (mentioned in the above paragraph) such as ballooning, light weight hash based page sharing, idle memory tax based page reclamation and I/O page remapping indirectly help ensure that the over-commitment and proportional sharing of memory is possible based on set configurable administrative parameters (very important for managing and handling dynamically changing customer needs) and hence assist in effectively pricing/prioritizing customers (VMs). Effectively this paper sets benchmarks for building high performance VMMs in future by clearly defining the goals and future directions.

Other contributions of the paper are a sampling based metric to determine idle memory, admission control to provide better user experience by pre determining the max, min (memory) and shares for each VM and also defines the states high, low, soft and hard and the operations used to dynamically transition from one state to another as and when required.

Evaluation:

The evaluation of page sharing clearly indicates a significant reduction in memory usage and an increase in page sharing as more and more VMs are added and also a negligible increase in CPU overhead in the case of homogenous VMs (running same OS and similar applications) and also heterogenous VMs. The paper clearly proves the strength of ballooning by evaluating the overhead (dbench throughput) with respect to VM size.

But while mentioning about the experimental results in the case of ballooning, the paper does not consider typical workloads for evaluation such as applications running on heterogenous VMs, which could have been done. But the evaluation of reclaiming idle pages based on idle memory tax and active memory sampling and also proportional sharing by allotting shares in the mentioned setting clearly justifies the premise of the paper.

Issues:

No evaluation of the impact of page replication and locality has been made as in the previous paper (Disco), though changes in the memory management techniques suggested in this paper might affect it. The paper mentions about dynamically plugging memory but provides no empirical analysis to back it, since this advantage could have been underscored a little more.

Posted by: Siddharth Suresh | February 4, 2016 03:21 AM

1. Summary

The paper provides details about the memory management techniques employed in WMware ESX Server for efficiently multiplexing hardware resources among virtual machines running unmodified commodity OS.

2. Problem

The paper explores techniques to run existing operating systems without any modification. The previous research works involve atleast a minor modification of the kernel sources. They have developed novel mechanisms for high-level resource management policies.

3. Contribution

The VM employs an extra level of address translation map to translate PPNs to MPNS for proving the guest OS with a zero-based physical address. It uses ballooning technique, where a small ballon loaded into the guest OS acts as a pseudo-device driver for page allocation. For reclaiming memory, the balloon is inflated and previously allocated pages may be deflated. This ensures that paging for the guest OS happens according to their needs. Demand paging is used if ballooning is disabled. It employs content-based page sharing thereby eliminating the need to modify the guest OS code and identifying more opportunities for sharing. Hash values summarising page’s content are used to determine the content equality. It introduces idle memory tax to reclaim idle memory from clients. A client is charged more for an idle page than for an actively used page. Pages are preferentially reclaimed from clients not actively using their full allocations. ESX Server computes target memory allocation for VM based on both its shared-based entitlement and estimate of its working set. Min size and max size determine the size of the VM and memory shares are the physical memory fraction. I/O Page Remapping is done to reduce I/O copying overheads in large-memory systems.

4. Evaluation

The author uses dbench benchmark to demonstrate the effectiveness of ballooning. The ballooned system performance is only slightly lesser than the non-ballooned system. To evaluate the content-based sharing, concurrent VMs running SPEC95 benchmarks were used. Sharing levels had approached 67%. This demonstrates that ESX Server is able to exploit sharing opportunities effectively. Memory sampling done to estimate the fraction of memory actively used by each VM, which demonstrates a 30% increase in dbench workload after tax rate change. Effective workloads were used to evaluate the parameters and their overheads were accurately studied.

5. Confusion

1. Active memory sampling

Posted by: Nivetha Singara Vadivelu | February 4, 2016 03:06 AM

Summary
The paper explains ESX Server, a virtual machine monitor which uses general (guest OS-agnostic) techniques for performing virtualization and resource management without requiring modifications or extensive support from guest Operating Systems. ESX Server's novel memory management techniques such as ballooning, content-based page sharing, idle memory tax, and hot I/O remapping help in memory reclamation, transparent page remapping and efficient memory utilization while maintaining performance isolation guarantees.

Problem
Cost-effectiveness of servers running on shared-memory multiprocessors could be achieved by consolidation of workloads as virtual machines, which allowed both performance isolation and efficient resource management. While this problem was attempted to be solved in earlier virtual machine monitor based approaches such as Disco, it required making modifications to the guest operating systems to achieve virtualization, which was tedious, sometimes impossible (for closed-source OSes) and introduced potential security risks. Implementing resource management policies also required guest OS knowledge / support.

Contribution
The authors introduce ESX server, a virtual machine monitor which did not require knowledge of guest Operating Systems to perform virtualization and resource management. They also implemented various guest OS agnostic memory management strategies on the ESX server.

The ballooning driver technique was used by the ESX server to coax a guest Operating System to swap out a page from memory (by balloon inflation), which could then by reclaimed for other use. In situations where ballooning was not available or insufficient, random demand paging was used to reclaim memory.

Transparent page sharing was achieved through content-based page sharing, which hashed physical page numbers to prevent expensive O(n^2) page content comparisons for most set of pages; a full comparison would only be conducted on a hash value match. These shared pages were marked as Copy On Write. ESX uses the concept of hints to prevent the Copy-On-Write overhead for a hitherto unshared page.

ESX also introduced the idle memory tax to penalize VMs that had a large fraction of inactive allocated pages, a value that was statistically computed by sampling the VM working set through TLB entries. Configurable parameters such as min VM memory, max VM memory, VM share and idle memory tax rate helped ESX strike a balance between proportional share allocation and effective utilization of memory. ESX also dynamically allocated memory based on memory utilization levels such as high, soft, hard and low.

Evaluation
The performance overhead caused by introducing the memory technique was evaluated by running the dbench benchmark and it proved that the ballooning overhead was negligible. The effectiveness of the content based page sharing mechanism was evaluated by running the SPEC95 benchmark on an ESX server hosting homogeneous copies of Red Hat Linux 7.2. This evaluation demonstrates that the ESC server can reclaim large fraction of memory whenever such sharing potential exists. This sharing mechanism was also successfully demonstrated on real-life production deployments of the ESX server. The memory sampling technique to estimate the VM memory usage was verified through the toucher application, while a demonstration on the idle memory tax implementation successfully showed its role in guiding memory reclamation from an idle VM to an active VM. Finally, dynamic relocation was tested on a workload of five virtual machines with differing memory requirements through their execution timeline. The test confirmed that ESX server's memory management techniques can dynamically respond and adapt to rapidly changing memory requirements of its virtual machines.

While the above evaluations confirmed the viability of ESX Server's memory management techniques, I believe that two things could have been added to improve it. The first is that the content-based page sharing test was not performed on an ESX server hosting multiple heterogeneous Operating Systems, which would have likely showed weaker sharing patterns because most OS code segments would now no longer be sharable. Another evaluation deficiency in the sharing test is that the sharing profile for code segments against data segments has not been provided, the paper only computes the proportion of zero pages.

Questions / Confusion
The paper did not discuss /compare the performance effect of its single-page sharing mechanism against the migration / replication strategies used in Disco. On a NUMA architecture, this comparison may have yielded interesting results.
Also, why did the authors not test ESX server against an actual OS, as done in Disco?

Posted by: Shantanu Bhate | February 4, 2016 02:58 AM

1. Summary
This paper describes and evaluates several novel schemes for memory management and sharing in VMWare’s ESX server, a Type I hypervisor. This paper talks about transparent mechanisms for memory reclamation, identifying opportunities for page sharing and a resource allocation policy that factors in idle memory.

2. Problem
To some extent the problem being solved is how a VMM can make better resource management decisions and co-operate with a guest OS without directly communicating with it. Making an OS kernel aware of the VMM may not always be practical as was done in Disco. A few key problems addressed include the following - (1) memory reclamation is important when a virtualized system overcommits memory to guest OSs. Simply swapping out guest OS pages could conflict with guest OS page replacement. (2) detecting common pages without directly modifying the OS and (3) identifying idle memory ratio in guest OSs.

3. Contributions
The paper discusses several innovate schemes which address an individual problem well, these ideas stood out in the paper -
1) Ballooning increases the pressure on the guest OS to release physical memory when the VMM requires it. This method performs better than page swapping because it maintains predictable performance and does not need a complex VMM level page replacement policy. In effect, it forces the OS to do VMM’s job of reclaiming memory.
2) The ESX server’s content based page sharing mechanism is brilliant and quite lightweight. A few key ideas include fast hashing for comparisons, and the idea of retaining hint frames so pages may be compared against in the future.
3) The paper introduces the idea of idle memory tax which penalizes virtual machines that hoard on idle memory. This effectively improves the throughput of the system without hurting the taxed VM.
4)The process of determining idle memory ratio is again interesting as it uses random samples and actively creating page faults for counting. It also use the maximum of a slow and a fast measure which gives a fast response on memory scarcity but slower otherwise.
5) ESX uses a layering of these multiple mechanisms and policies to implement an agile system.For example, on changing idle tax rate parameter, the VMM is able to quickly reclaim resources using ballooning.

4. Evaluation
The authors use micro-benchmarks wherever possible to highlight how the mechanisms perform. To evaluate page sharing, they breakdown the memory footprints while running multiple instances of the same OS, which is a best case scenario. It is surprising that they are able to share 67% of a VMs guest physical pages, with low CPU overheads. They use an interesting synthetic benchmark (touching pages) to show how their idle page estimates closely track idle pages in a guest OS. They also provide an evaluation of idle memory reclamation in terms of memory used and allocated. It would be great they had shown the performance impact too for this case

5. Confusion
How can it detect such high amounts of common pages using random sampling? when are hint frames allocated - is for every sampled page

Posted by: Brian Coutinho | February 4, 2016 02:46 AM

1. Summary
The author of the paper discusses the memory management techniques implemented in the VMware ESX server. Mechanisms such as ballooning, share based allocation, idle memory tax and content-based page sharing are used to enforce high-level policies to allow efficient VM operation in a memory overcommitment and varying client priority scenario.

2. Problem
Server consolidation helps improve the utilization of servers and reduce costs. But this method has a potential to violate the resource guarantees of an OS. However, the biggest issue with this method is the difficulty of maintaining the QoS guarantees provided to by individual servers. Thus effective policies are required to maintain the balance of utilization along with QoS and resource management. In order for a system to be acceptable, these policies and mechanism should be able to manage a commodity OS without any modification.

3. Contribution
This paper manages to solve the above problem using very innovative techniques. Ballooning is a very interesting idea of having a device driver generate artificial demand within a Guest OS. This forces the Guest OS to swap out space helping the VM reclaim allocated space. The major advantage of this method for reclamation is that OS is aware about the process. Another unique mechanism introduced in this paper is the way they detect sharing. By using content based hashing techniques to identify potential pages, they provide a fast and yet simple solution to detect sharing. However, the most important aspect of this paper was the concept of Quality of Service and mechanisms to address it. A VM can have multiple clients each having a different priority. Maintaining an overcommitted system in such a scenario can lead to breach of QoS contracts. At the same time, enforcing QoS in a scenario where high priority applications are not utilizing their allocated memory footprint can lead to unnecessary performance degradation of other OSes in need of memory. Share based allocation and Idle memory tax avoid such scenarios. Overall, all these mechanism do not require any change in the guest OS and thus providing a solution with a potential for mass adoptability.

4. Evaluation

The paper provides detailed evaluation for each feature or mechanism suggested. The effectiveness of the ballooning technique is proven by showing the performance of dbench workload under equal availabilities of real as well as artificially reduced memory. The effectiveness of the page sharing techniques is proven when we see a reclamation of 60% of allocated memory in a virtualized setup, and 7-33% in real-world environments. The performance of active memory sampling and the dynamic memory reallocation technique is also evaluated.
However, the author does not talk about the performance impact of these features in great detail.

5. Questions
Is the method of content based sharing scalable? How do VMs do it in current system?

Posted by: Urmish Thakker | February 4, 2016 02:30 AM

1. Summary
This paper presents the mechanisms that are part of VMware ESX server, a type 1 VMM, for efficiently managing the memory management across VMs. The motivation behind the mechanisms discussed is to support efficient server consolidation by overcommitting memory and to allow commodity operating systems to run on the VMM without any modification.
2. Problem
VMMs aim to minimize the machine idle times by multiplexing resources, including memory, between multiple VMs. However, the VMMs that existed before this study did not succeed in achieving this aim. A prime example of this is pure share-based memory allocation which can lead to a situation where some VMs are hogging memory without really using it actively, whereas other VMs are under severe memory pressure. Further, many VMMs that existed required modification in OS to support their optimizations. ESX targeted to solve these problems.
3. Contributions
First and foremost, the design philosophy of ESX maintained the invariant that the commodity OS should not be changed at all to support the VMM. An equally important goal of the work was to ensure maximum utilization of hardware resources, like memory, which manifested in the form of concept of overcommitting. Instead of using meta-level page replacement policies for memory reclamation, the idea of ballooning was quite impressive since it cooperated with the guest OS to reclaim memory when required. This was well augmented by the idle memory tax based memory reclaiming higher-level policies. They also implemented a reasonably accurate mechanism for tracking memory idleness, with minimum overheads. The second major contribution was the idea of content-based page sharing. By using fast hashing functions and maintaining special data structures, ESX enabled transparent sharing of pages with minimal impact on performance. Overall, all these mechanisms and policies facilitated overcommitting of memory by reducing the memory pressure under-the-hood. They also ensured that VMs would be able to ramp-up their memory usage quickly, if required. Another small addition made by the paper was adding support for hardware managed TLBs.
4. Evaluation
The efficacy of ballooning has been demonstrated using dbench. Though it conveys the idea well, an evaluation using a more diverse set of benchmarks is lacking. The experiment showing the optimistic page sharing capabilities of ESX, accompanied by a discussion of memory sharing achieved on real deployments of ESX serve well to establish how effective the page sharing in ESX is. The experiments showing active memory sampling and idle memory taxing techniques in action are good for understanding the idea. But as mentioned previously, an analysis based on more workloads is needed for the complete picture. Towards the end, the experiments used Microsoft Exchange benchmark, Citrix MetaFrame benchmark and Microsoft SQL based workloads to demonstrate how all the policies designed work together to release memory pressure in the system. However, a more direct comparison using performance statistics against systems that already existed, systems without memory reclamation and sharing optimizations etc. would be more interesting to look into.
5. Confusion
The paper does not discuss in detail the policies that control the scanning of pages for supporting memory sharing. Also, while discussing hashing of scanned pages, they argue that there would be very few clashes, yet they maintained chaining in the stored frames. Not sure what are the expected gains (potentially harms) from chaining. What is the significance of maintaining 16-bit count for shared pages? Why is that compared to backmaps of Disco?

Posted by: Lokesh Jindal | February 4, 2016 02:29 AM

1. Summary
The paper introduces the memory management techniques like content based page sharing, ballooning and idle memory taxing employed in VMware ESX server without modifying commodity OSes. The methodologies differ from existing solution in its ability to overcommit memory, lower execution overhead and scale to different VM configurations.
2. Problems
VMM like disco require modifications to the commodity OS for effective hardware utilization and the lack of information on how OS handles memory leads to memory management issues(eg. double paging). Also VMMs need to overcommit memory in order to extract maximum performance from the hardware. Overcommitting memory is a difficult without support from Operating systems. Also, these systems need to scale resources proportionately with VM configurations and VMMs need to reclaim idle memory from processes transparently to aid scaling.
3. Contributions
The contributions of this paper are to efficiently solve (low execution overhead) memory management issues in VMMs. These methodologies strive to improve upon techniques introduced in the Disco paper. Contributions of this paper are as follows:

Ballooning: Since OS’s page replacement policies are not known to VMMs, adopting any page replacement policies might result in degrading performance. To tackle this problem, VMware ESX server employs a technique called VMM, where in a virtual device driver requests for pinned memory for execution and in order to support this request, OS swaps out unused pages. Size of this balloon is determined by the VMM.

Content based Sharing: To avoid making changes to the guest OS like Disco for identifying pages to share, ESX server employs content based sharing. In this technique, hashes of pages are stored in a table and pages are compared against these hashes to check which pages can be shared. Matching hashes merely increase the probability of a match and do not guarantee matching pages. Hashing technique ensures minimal aliasing. To reduce execution overhead, not every page is hashed and pages are selected randomly. Even simple technique results in identifying significant number of shared pages.

Idle Memory Tax: ESX server tracks active pages using a low overhead sampling technique and reclaims idle memory using a taxing mechanism. This technique also guarantees resource allocation corresponding to VM configuration.

4. Evaluation
To evaluate ballooning, the author has compared the throughput of VM configured for a memory size to VMs whose memory has been ballooned down to the size in consideration. Page sharing improvements are shown through running SPEC95 benchmarks on various VM configurations and also presented data from real world deployments. Idle memory tracking and reclaiming are evaluated by running application on top of commodity OS like Windows 2000.
5. Problems
The paper does not take some of the hardware architecture into consideration while explaining the methodologies. In a NUMA-CC multiprocessor system, maintaining a consistent hashmap for content based page hashes would be a difficult process than a simple system. Also with evolution of complex hardware, system administrators would want to consolidate multiple VMs in a single piece of hardware. In such a scenario, will there be same amount of memory sharing and be able to amortize the cost in computing the hashes? How many of these techniques are still being used?

Posted by: Bhardwaj Krishnamurthy | February 4, 2016 02:18 AM

1. summary
This paper summarizes the memory management principles behind the VMware ESX Server architecture to virtualize hardware in-order to enable multiple unmodified commodity operating systems to run concurrently on the same machine. The server achieves its goals by using innovative techniques like ballooning , idle memory tax , content based page sharing and hot page remapping.

2. Problem
The major problems addressed in the paper are as follows. Through overcommitment, server consolidation can lead to significant improvement in system resources utilization but it also introduces multiple challenges in memory reclamation design so as to prevent phenomena like double paging.Additionally , current systems like disco require modification of the guest OS to enable sharing of pages using copy on write principles which is not always feasible or desirable. Lastly , another problem with current VMMs is the lack of consideration of quality-of-service guarantees when dealing with memory management.

3. Contributions
The main contributions of the paper can be summarised as follows. Firstly, a device driver or balloon module is inflated and deflated to maintain pressure on the guest operating system's memory management algorithms in order to coax it to cooperate with the ESX servers page replacement mechanism.This has clearly been justified to be better than traditional reclamation mechanisms that would lead to double paging.

Secondly , the introduction of a table of scanned pages (both copy on write pages and hint pages) that can be indexed based on a hash obtained from the contents of the PPN in order to allow efficient content based page sharing is a significant improvement over earlier implementations (like disco's) that required guest OS modifications. Additionally, by implementing idle memory tax the ESX server ensures that quality of service guarantees to clients are upheld while effectively improving system-wide performance.

Lastly ,a set of well thought of admission control policies have been introduced to ensure that sufficient resources are reserved for each new VM. Also statistics are maintained to track hot pages and remap them to lower/higher memory.

4. Evaluation
The paper provides well justified and detailed evaluation with graphs and statistics for the major contributions that it makes. It is shown that the throughput of the system with ballooning is only slightly lesser than the throughput without. Further it is shown that for multiple VMs running the same benchmark , the amount of shared memory reaches 67% . This highlights the upper bound on the advantage of their memory sharing principles very well,they further provide statistics on memory sharing in a more diverse environment. The justification and correctness of the active memory sampling design is well explained , specifically they point out anomalies in their graphs(due to windows ZPT) that cause an unexpected spike in memory usage.

5. Confusion
Would like more clarity on the active memory sampling mechanism wherein the maximum of the slow moving average , fast moving average and modified fast average is used to measure idle memory.

Posted by: shreya kamath | February 4, 2016 02:16 AM

1.Summary:
This paper focuses on design and implementation of ESX server, a software layer for hardware resource multiplexing among the virtual machines running commodity operating systems on top of it. It uses techniques such as ballooning and content based sharing for efficient memory management.

2.Problem:
1) Server consolidation was necessary for organizations, with features such as isolation and efficient resource management for virtual machines of varying importances. Virtual machine monitors like Disco which did memory management still required changes in commodity operating systems running on top, for this purpose.
2) Transparent paging policy resulted in double paging problem, in which the OS chooses to write to the same page, that was just reclaimed.

3.Contributions:
1) The authors introduce a new layer between the virtual machines and hardware, which does resource management and efficient sharing and this does not require any change on the operating systems.
2) The technique of ballooning in ESX server does co-operative memory management along with the Guest OS, thus avoiding the problem of double paging . This is achieved by having a kernel service(ballooning) loaded into the Guest OS, which when inflated reclaims memory by paging out and paging in by deflating.
3) Rather than depending on the VMs and applications for data sharing, ESX does content based sharing where the pages with similar contents are shared between the VMs by hashing techniques.
4) Share based resource allocation for VMs of different varying importances has greatly helped in effective resource utilization which is also being widely used today.
5) Over-committing of memory for each VM and dynamic memory allocation policies for addition/removal of VM/per VM allocations by computing the memory utilization states such as high, soft, hard and low.

4.Evaluations:
The authors have extensively evaluated the efficiency of all the features introduced as part of the ESX server such as balloon performance using dbench, page sharing on identical linux VMs with 60% memory reclamation, dynamic reclamation and allocation using idle memory tax, which boosts the VMs performances by 30%. Though there is slight overhead as a result of introducing a new layer of virtualization, the performance of the system seems to be better. Experiments on varied virtual machines rather than identical ones would have been great.

5.Confusion:
Handling NUMA-ness of hardware by ESX.

Posted by: Sharanya Devaraj | February 4, 2016 02:07 AM

1. Summary
This paper mainly talks about memory management policies in VMware ESX server, a bare metal hypervisor running unmodified commodity OS. Ballooning technique help persuade guest OS into reclaiming configurable memory, Content based page sharing is used with efficient hashing mechanism to eliminate the need to modify OS and dynamic reallocation algorithms borrowed from economic world like shares and tax are employed to achieve efficient memory utilization with performance isolation.
2. Problem
Several problems being addressed are: - Current meta-level page replacement policy in VMs make uninformed resource management decisions and problem worsens with diverse and undocumented OS policies. Earlier bare metal hypervisor like Disco require several OS modifications to HAL to detect need for sharing. Traditional OS didn’t provide “Pay as you go” service to clients of varying importance and also pure proportional-share frameworks used on web server do not incorporate active memory usage. Redundancy and copying overheads are imposed with I/O page remapping with modern systems.
3. Contributions
Techniques like ballooning help memory virtualization to configure memory being used on the fly with future hot-pluggable memory cards without the need for reboot. With this mechanisms, more research would be required in page replacement policies. With the introduction of content based page-sharing, standard data analytics algorithm and approaches like indexing could be employed in analyzing OS footprints for sharing memory across unmodified OS. VM could be extended to commercial systems with high end GPUs/memory, unaffordable by clients for providing guaranteed quality-of-service. The approach of dynamic share-based allocation incorporated with active memory usage help build a powerful and adaptive stable system guaranteeing temporal isolation and efficient memory utilization. This feature could also improve failures limited to few VMs. Since ESX server target adaptiveness, advancement in feedback driven workload management and higher level grouping could lead to progress.
4. Evaluation
Paper is evaluated across different configurations. With 1 VM running Linux on dual processor to demonstrate ballooning effectiveness, throughput scaled linearly as a function of VM size. With identically configured VMs for uniform workload, level of sharing was enhanced with increased VMs due to sharing of redundant code and read-only data pages. Though it would have been better evaluation with server consolidated diverse workloads as in some real-world scenarios. As speculated, ESX server responded rapidly to increased memory usage and gradually to decreased one; between Windows and Linux by adjusting tax rate when experimented with dual VM running, proving the efficiency of memory allocation algorithms. In case of dynamic reallocation, some instabilities in approaches like balloon drivers starting late and zero page thread behavior of Windows didn’t give accurate evaluation though it was described that attempt to share before swap was effective in reclaiming pages.
5. Confusion
I’m a bit confused with use of approaches like to measure idle memory with statistical sampling approach and to decide remapping of page into low/high memory since they rely on randomize algorithms whose effectiveness is only proved to be in theory and not practice.

Posted by: Unmesh Phalak | February 4, 2016 01:54 AM

1. Summary
This paper focuses on the design and implementation of a software layer (ESX server) to multiplex hardware resources efficiently among various virtual machines. Their work differs from previous attempts in that it does not require any modification to the guest OS for efficient resource utilization. The server uses various techniques such as ballooning, idle memory taxing and content based sharing to ensure efficient resource management without relying on any explicit information from the guest operating systems.
2. Problem
The lack of operating system agnostic virtualization technologies was holding back server consolidation. The advantages of such a technology would include simplifying management and reducing hardware costs as well as running various applications that do not have high resource requirements on a single machine while insuring protection and isolation. The technology needed to be operating system agnostic as modifying each OS would not be a practical solution
3. Contribution
The authors introduce a type 1 virtual machine monitor which was able to perform memory management without needing changes to the guest OS without any noticeable overheads. The ESX server adds a balloon module on each VM, which inflates in memory consumption to force the guest OS to invoke memory management algorithms while the server freed the memory underlying the balloon. Memory sharing has been a common theme for VMMs, ESX introduced content based sharing using hashes to identify similar pages and share them by using copy on write techniques. In my mind however the biggest contribution of this paper was the proportional resource sharing model which honoured the minimum shares of each client while not letting them waste resources. This has lead various modern virtualized and cloud environments which strive to meet customer Service Level agreements while also maximizing resource usage. ESX works towards this by assigning/reclaiming resources based on each client’s share and minimum value of memory as well taxation imposed on wasted/idle memory. The overall system is kept within reasonable limits by setting high, soft, hard and low limits on memory which dictate how aggressively the ESX server reclaims resources from virtual machines.
4. Evaluation
The authors evaluated each of the core components individually to establish their individual performance gains and overheads. They then ran a real world use case to see the overall system behaviour. They measured the overhead introduced by inserting the balloon module in guest operating systems to reclaim memory. They then measured the amount of shared memory as the number of VMs was increased. To reclaim idle memory the server needed to estimate the amount of idle memory in a guest OS, the papers tests the accuracy of a statistical approach to gauge the amount of free memory in each VM. In the real world test the server ran 5 VMs running different workloads to demonstrate how the ESX could manage an overcommitted use case and still keep sufficient free memory. While thorough, their evaluation strategy did not seem wide enough to me. They should have tested memory sharing while using different OSes as it would reduce the amount shared OS memory and because this is a very realistic real world use case. Secondly, they could have compared the cost of running 5 VM on one server versus the cost of running them on dedicated computers.
5. Confusion
Why they did not address NUMA architectures as page sharing would affect performance. They only talk about the percentage of shared zero pages, perhaps a much more granular breakup of the pages shared (code vs data etc.) would have provided better insights.

Posted by: Abhinav Mehra | February 4, 2016 01:50 AM

Summary:
The paper describes memory management mechanisms like content-based page sharing, active page management with ballooning, dynamic page reallocation across VMs and I/O page remapping in the VMware ESX server. These mechanisms are primarily used when the Virtual Machine Monitor over-commits memory to the VMs.

Problem:
Memory management in servers running multiple Virtual machines has been sub-optimal in the sense that dynamic sharing of memory resources amongst VMs is lacking. Although there are solutions like Disco which address this problem, they require minimal changes in the OS that run on the Virtual machines, which is not a viable solution. This paper tries to address that shortcoming while not compromising on the extent of dynamic sharing.

Contributions:
The paper describes memory management mechanisms implemented in the VMware ESX server that allow virtual machines to overcommit memory while still maintaining the ability to seamlessly run commodity operating systems in the Virtual machine. The paper describes the following mechanisms
Ballooning: To avoid the performance penalty of double paging, a dummy device driver is used for every VM that is directed by the VMM to allocate or deallocate memory inside the guest OS as necessary. This technique capitalizes on the scenario-aware page replacement policies of the guest OS to do paging/dynamic memory allocation to other guest VMs at the machine level.
Content-based sharing : Instead of modifying OS code to give clues about sharing, the paper describes a non-intrusive content based sharing mechanism where sharing is identified by hashes of the actual contents of the pages checked for possible sharing.
Idle memory reclamation: Idle memory from VMs are identified with fairly precise heuristics and are reclaimed using the ballooning technique to provide to fairly memory hungry VMs while still maintaining quality of service guarantees.
I/O page remapping: Pages that are frequently accesses by I/O are transparently remapped to lower region of memory to avoid copying pages multiple times.
Evaluation:
The paper evaluates each of the mechanisms described with fairly varied workloads. However, in case of content-based sharing which is seemingly resource intensive (processor and memory), the paper evaluates the overheads only for the best-case scenario where there is significant sharing among VMs. Significant analysis on the coordination of all these mechanisms has been done with real world workloads has been done too.

Problem:
The mechanisms are not NUMA aware. Disco recognizes this by doing page replication of shared pages. The paper is quiet on the architecture of the underlying hardware.

Posted by: Prashanth Balasubramanian | February 4, 2016 01:45 AM

Summary
VMware ESX Server is thin software layer designed to multiplex hardware resources efficiently among virtual machines running unmodified commodity operating systems. The paper discusses a few novel memory management techniques and policies used by it and their effectiveness is showed by a set of extensive experiments.
The problem
With the industry trending towards techniques like server consolidation and use of shared-memory multiprocessors , there has been a growing interest for virtual machines. This proliferation of VMs made way for the need of better resource management techniques that would support techniques like overcommitment . Another persistent problem plaguing virtualization is how to run existing commodity OS without modification on these VMMs.
Contributions
1.One of the main attractions of ESX Server is that proprietary OS can run without any change .Also unlike VMware Workstation which uses a hosted virtual machine architecture , ESX Server manages system hardware directly,providing significantly higher I/O performance and complete control over resource management.

2.This paper introduces several novel techniques for allocating memory across the virtual machines. One such mechanism is ballooning for memory redeclaration where the guest OS uses its native page replacement policies .

3.Another such novel idea is that of an idle memory tax which solved an open problem in share-based management of space-shared resources,enabling both performance isolation and efficient memory utilization. The basic idea is to charge a client more for an idle page than for one it is actively using. Idleness is measured via a statistical working set estimator which is ultimately used to implement a proportional-share algorithm.

4.The authors propose the idea of using shadow map tables which contain direct mapping from virtual to machine address and are kept consistent with physical to machine mappings. As a result the extra level of translation is avoided. This approach can result in significant performance gains.

5.Content-based sharing is another new idea used by ESX which uses hashing to identify pages with potentially identical contents that can be shared. This approach has a twofold advantage, firstly we need not to modify,hook or understand OS code;secondly, it identifies more opportunities for for sharing .

6.ESX uses hot I/O page remapping exploit transparent page remapping to eliminate redundancy and reduce copying overheads for I/O transfers .
7.ESX Server supports dynamic reallocation of memory in response to various events .
8.All the aforementioned techniques allow overcommitment of memory which facilitate higher degree of server consolidation than would be possible for simple static partitioning.

Evaluation
The paper presents a very comprehensive evaluation by using specific workloads for the evaluation for different aspects VMware ESX that demonstrate the respective advantages best . For example the low overheads of ballooning is proved on a benchmark ,dbench ,which benefits significantly from additional memory. To evaluate ESX Server page sharing implementation , experiments were conducted to quantify its effectiveness at reclaiming memory and its overhead on system performance by first analyzing a “best case “ workload followed by additional data collected from production deployment serving real users. Experiments run on a user-level toucher application show expected results for idle memory taxation scheme. Dynamic reallocation was demonstrated by running a workload consisting of 5 virtual machines with the machine memory deliberately limited to to 1GB for better understanding of effects of memory pressure. However an evaluation of the performance of shadow mapped table and I/O remapping is missing.
Confusions
If I/O transfers are by default configured to address low memory only, under what circumstances do we have “hot” pages in high memory?
For 'double paging' how can the Os select the same page to write to its virtual paging device after it has been already paged out by a meta-level policy?

Posted by: Amrita Roy Chowdhury | February 4, 2016 01:42 AM

Summary
The paper introduces new efficient ways of memory management for virtual machines running unmodified commodity operating systems. These techniques are a part of a software layer, VMware ESX Server, which is a layer designed to multiplex hardware resources among virtual machines. The paper further comes up with various benchmarks for the newly introduced memory management techniques against the traditional ones.

Problem
With the rise of using server consolidations, efficient use of hardware resources gained importance in the field of virtualization. The authors target one such problem of managing memory efficiently for a number of virtual machines, which have a memory requirement in total greater than the actual hardware memory, hence creating a over-commitment of memory. The aim was to provide efficient solutions to such memory requirements without modifying the commodity operating systems.

Contributions
1. The paper presents new memory management techniques in VMware ESX Server that allows commodity operating systems to go unmodified and run as virtual machines.
2. For page replacement, a technique named 'ballooning' is introduced. It helps in reclamation of memory from virtual machines and making them run with smaller memory. This is done by the introduction of a driver in the guest OS. The driver reclaims/provides memory to the virtual machine as per the decision made by ESX Server.
3. The problem of effective memory sharing is solved with content-based sharing. The technique includes scanning of pages from different virtual machines and sharing pages on copy-on-write basis if their contents were found similar. Hashing techniques are used to provide faster comparison of page contents. This did not require any modifications to the commmodity operating systems which was prevalent in systems like Disco.
4. Share-based allocation technique was used to provide memory to the VMs. A VM would receive memory proportional to the number of shares that it possessed. It also included looking for idle memory among a VM's allocated memory and reclamation of such memory when the physical memory(of actual hardware) was scarce. This was done by levying a tax on such VMs proportional to their idle memories.

Evaluation
I found the benchmarks to be good enough with a sufficient amount of testing and varied workloads. The benchmarks showed improvement to in memory throughput it's traditional virtual monitor softwares. Though the paper manages to give hints of minimal / no compute overheads for the new introduced memory management techniques, a compute benchmark showing very small variance from the traditional techniques would have strengthened the assertion.

Confusion
The author mentions about a private channel used by ESX server to communicate in ballooning. What is that ?
Also, is content-based sharing really not that compute intensive ? If not, is this being used now ?

Posted by: Akshay Kanfade | February 4, 2016 01:36 AM

1. Summary: This paper explains the memory management schemes used in VMware ESX Server, which is a type-1 VMM designed for a commercial server. ESX aims to provide better performance and resource management by exercising direct control of the hardware. The principles suggested and evaluated in this paper present a great design of a system-level software designed according to the economics of the situation.
2. Problem: The previous attempts for a VMM (Disco, for example) ended up making changes to the guest OS. Ideally VMMs should be transparent to the guest OS. This also leaves the authors to solve the problems like how to provide sharing, which Disco solved by changing HAL of the guest OS (via dynamic page migration and replication). In order to be commercially viable, the system also needs a way to guarantee performance to the customers, in proportion to what they pay, whilst providing reliability and maintaining system performance. The authors also plan to overcommit resources for better utilization, and need to figure out a way to make policy decisions for that.
3. Contribution: The author presents many innovative techniques in this paper: Ballooning, Content-based page sharing (which is like CAM in software, except it is run in background and occasionally), Share-based allocation, idle-tax based reclamation, etc. One big contribution, though, is how they show that almost all the parameters of a system can be configurable while maintaining a neat organization.
The VMs are allocated sizes based on their shares, and these sizes can vary on the run. They have a mechanism to switch between ballooning/demand-based paging depending on the system state. The time at which memory is sampled for sharing/reclaiming/activity can also be controlled based on the client. They also support future support for pluggable memory cards to allow dynamic physical address range. One other big contribution was how ESX removes the idea of statistically multiplexing memory based on shares, and introduces ‘tax’. This allows efficient over-commitment, whilst lending some economic market-based approach to system development. In the end, they show how VMMs can be designed innovatively to avoid any changes in existing OSs.
4. Evaluation: The authors present a crisp evaluation of almost all the policies decided. They show that ballooning does not reduce the throughput a lot, their page sharing effectively reclaims most of the shared pages, their memory tax effectively allows overcommitment, and increases throughput of one VM when another VM is hogging resources. However, I feel that most of these evaluation with homogeneous OSs running in the VMs (either all Windows/ all Linux). For example, how would page sharing/ dynamic reallocation perform for a mix of Windows and Linux VMs? The performance of demand paging, which is used in the system critical ‘hard’ and ‘low’ state is also not shown. Also, a graph showing the performance of workloads in the absence and presence of over-commitment would have been great for comparison of over-commitment performance.
5. Confusion: Where are virtual disks in the system. For every virtual disk write, does the VMM write to the hard-disk? With heterogeneous OSs running in different VMs, how does the VMM handle the different File systems? How does demand paging ask the guest OSs to relinquish a particular page? Seems like an interface would be needed on the guest OS for this. How does paravirtualization change this ‘strictly no modification’ to guest OS philosophy? How does a hint frame help in optimization?

Posted by: Mohit | February 4, 2016 01:32 AM

1. Summary
In this paper, the authors introduce VMware ESX Server, a thin software layer designed to multiplex hardware resources efficiently among virtual machines running *unmodified* commodity operating systems. This paper explains several novel policies and mechanisms employed by the ESX Server for memory management, namely ballooning, idle memory tax, content-based page sharing and hot I/O page remapping.

2. Problem
Memory is usually underutilized when a group of virtual machines run on commercial machines. It is difficult to solve this problem without overcommitting memory and without making any modifications on the commodity operating systems. This paper mainly addresses the problem of how to overcommit and manage memory while still providing the resource guarantees to all the running virtual machines.

3. Contributions
The major contributions of this paper are:
* Ballooning technique that implicitly coaxes a guest OS into reclaiming memory using its own native page replacement algorithms.
* Content based page sharing technique that identifies redundant pages and thus maps multiple “physical” pages to a machine page. Although this concept was originally introduced by Disco paper, this paper fixed some issues in the Disco system that required modifications of the guest OS using hashing and statistical sampling.
* Idle memory taxation technique that charges a client more for an idle page than for one it is actively using. It leads to efficient memory utilization while maintaining memory performance isolation guarantees.
* Higher level dynamic reallocation policy that effectively coordinates above techniques.
* Perhaps the most important of all: implementation of above techniques without ANY modification to the guest OS.

4. Evaluation
This is a very well written paper, that builds on the prior work of Disco and addresses many limitations of Disco. They back all of their novel memory management techniques with well designed experiments and reasonable performance numbers. The results of the balloon performance showed that this technique is indeed effective, although with some overhead.
Especially impressive were the page sharing performance results that showed that up to 60% of the memory could be reclaimed. Further the experimental results on active memory sharing and idle memory tax solidify authors’ proposal and also validate their faith in the statistical sampling.

However, authors do not make adequate arguments for random sampling. it looks like a flaw when certain kind of workload is not appropriate for random sampling. Further, authors also do not provide data for CPU overhead and just mention that it is negligible.

5. Confusion
What will happen when certain kind of workload is not suitable for random sampling? Plus the details of the randomized sampling is not provided.

Posted by: Udip | February 4, 2016 01:15 AM

1. Summary
This paper introduces various techniques VMware ESX Servers applied to maximize memory utilization while guaranteeing quality of service, including reclaiming memory with a balloon driver, estimating idle memory with sampling and penalizing it with tax, identifying sharable pages with hashing and avoiding copying by remapping I/O pages.

2. Problem
Server consolidation is a recent trend as many individual servers are underutilized. Virtualization techniques that support overcommitment of memory can address this issue. Previous works rely on modification in the guest operating system to some extent.

3. Contributions
ESX Server introduces a new graceful way to reclaim memory from guest operating systems. Instead of paging without cooperation with the guest, it places a balloon driver in the guest OS to request and release memory as a driver. This pass the paging decision to the guest who has better knowledge of its running state. The balloon method is preferred over paging in the VMM.
ESX Server combines the ideas of static proportional shares and dynamic changing work sets by extends the proportional-share algorithm with idle memory tax. One’s shares-per-page ratio gets lower if it is not actively using part of memory allocated to it. The amount of idle memory is estimated by introducing page faults for sampled pages.
Page sharing across virtual machines is achieved by scanning the contents and match by hash. This does not rely on hints from the guest OS and has the potential to discover all duplicated pages. To reduce the trap overhead with copy-on-write, unshared pages are added to the hash table as hint frames and their hash is recalculated once there is a match.
The ideas of I/O remapping is just an application of the remapping ability. The management of low pages is not discussed.

4. Evaluation
For the balloon reclaiming, the paper compares the performance of a ballooned machine with a native one for various memory size, and concludes that the overhead is small. But no experiment is done to show the advantage of using ballooning over paging.
The accuracy of idle memory estimation is shown by running an application accessing a varied amount of memory. Curves of the estimated and actual amount of active memory look close. Overhead introduced by sampling is not measured.
To show the effectiveness of idle memory tax, two VMs, one idle and another memory-intensive, run under different tax rates. The improvement is shown clearly.
Experiments on content-based page sharing are done both on ideal environment and in real world production. In the ideal experiment, all VMs run identical workloads and the effectiveness as well as overhead is good enough. Data from the real world do not contain information about CPU overhead. The result differences between workload B and C are lack of explanation.
The overall performance of these techniques are demonstrated by a rather complex setting. Effects can be observed but there is no comparison with other systems, designs or even expectations.

5. Confusion
At the beginning of Section 4.3, the authors say collisions are handled by chaining. While later in the same section, shared pages are assumed to have unique hash values.

Posted by: Xiangjin Wu | February 4, 2016 12:47 AM

1. Summary
The paper discusses about ESX Server which multiplexes hardware (memory) for Virtual Machines (VMs) running commodity OS. It also provided dynamic and novel memory management techniques such as Ballooning and Idle memory tax. Through experiments they were able to prove that their memory sharing techniques were efficient without having high overheads on CPU execution time.

2. Problem
Lack of a mechanism which provides a platform for VMs running commodity OSes without making changes in the existing code (of OS) and yet be able to efficiently manage memory. They also try to address the issue of Server consolidation for VMs over-committing memory.

3. Contributions
The authors introduced a Type 1 Virtualization technique to handle memory management in VMs running commodity OSes without having to having to make change in OS. They introduced techniques to reclaim memory from a VM by ‘Ballooning’ - which pins particular Machine pages which then can be revoked from the ESX Server, thereby reclaiming memory. This balloon can be dynamically ‘inflated’ - (claim more memory from a VM) or deflated without proving to be a overhead on the OS. By sharing the same pages by hashing than complete content based sharing improved the algorithm from O(n^2) to O(n). Also, by providing hints to the page content, they were able to address the problem of making every non-shared page otherwise marked as Copy-on-write (which would otherwise lead to creating copy of each page, another performance overhead). By introducing concept of idle tax, they were able to dynamically set the fraction of resources available to a VM, rather than a static/purely shares based approach. This improved the idea suggested by ‘min-funding revocation’ and strikes a balance between proportional share allocation and effective utilization of memory. By providing a min/max parameters and providing high/soft/hard/low limits, they were able to answer when to add new VMs and what will their extreme memory requirement conditions. Based on a state they can deploy Ballooning or Page migration algorithms to conservatively or aggressively reclaim pages.

4. Evaluation
The authors provided step by step evaluation of various techniques. They first proved overhead of ballooning was negligible which reinforced the feasibility of the technique. They then studied that shared memory actually increases on introducing more VMs. In the last set of experiments they studied behaviour of real time applications (Exchange/Citrix Server and Dbench) and provided explanations to the empirical observations. They also conducted experiments to show how their idle memory management technique can provide VMs with memory dynamically. Thus overall, they touched every new technique they introduced and proved it experimentally.

5. Confusion
They did not discuss the
Performance of their solution on a NUMA hardware.
They did not compare their solution by only running commodity OS against OS on ESX Layer.

Posted by: Vikas Goel | February 4, 2016 12:46 AM

1. Summary
This paper describes the various mechanisms and policies used by VMWare's ESX Server to manage memory. It introduces several techniques like ballooning, content based page sharing, idle memory tax, I/O page remapping, etc and evaluates them with supporting data.

2. Problem
Memory is a premium resource for type-1 hypervisors especially when it is overcommitted. Previous attempts by VMM's like Disco to manage memory, required guest OS modifications and was based on proportional sharing which is not very effective. ESX Server attempts to solve the problems of memory allocation, memory reclamation and memory sharing without modifying the guest OS.

3. Contributions
The primary contributions of this paper are:

Ballooning - A pseudo-kernel device driver is installed on the guest OS. This acts as a balloon which can be inflated or deflated based upon resource availability.

Content-Based Page Sharing - Pages residing on machine memory are shared based upon a high quality hash computed over their contents. Pages are picked up randomly and a hash is computed over their contents. This hash is compared against an existing hash table and whenever a match has been identified, the page is marked CoW.

Idle Memory Tax - ESX charges a VM more for an idle page than one which it is actively using. This is done via a statistical sampling of VM's pages to determine the fraction of actively used VM memory.

Allocation Policies - Parameters min, max and memory shares are configurable by the administrator. Admission control is used to either allow or disallow a VM if min + overhead amount of memory is available. If memory occupancy is very high, paging is used to reclaim memory and overcommitted VMs are stopped momentarily until sufficient memory becomes available.

I/O Page Remapping - This is used to remap frequently used high memory pages to low memory. This is implemented by counting the number of references to a page and based on the threshold, once the threshold has exceeded the PPN-MPN mapping is modified.

4. Evaluation
This paper is extremely well written and gradually introduces the concept (mechanism/policy), explains its implications and provides data to support the claims.
Concept like ballooning is extremely useful and is used by Citrix XenServer even today to manage memory. The daemon responsible is called squeezed (http://xapi-project.github.io/squeezed/architecture/architecture.html).
The low overhead associated with ballooning is explained via the dbench benchmark. The paper also has graphs and empirical data to support the claims associated with memory reclamation and dynamic memory reallocation but the lack of legends on most of them makes it hard to process them.

5. Confusion
The whole idea of active memory sampling and selecting the max amongst the multiple moving averages is confusing and not justified sufficiently. More details on the private channel between the balloon driver and ESX would be interesting. Typically ESX Servers work in clusters and this isn't discussed in the paper at all.

Posted by: Vinothkumar Siddharth | February 3, 2016 11:58 PM

Summary

The paper introduces a commercially available VMware ESX Server that helps in efficiently multiplexing hardware resources among virtual machines running unmodified commodity operating systems. This paper describes the various mechanisms and policies employed for effective memory management used to build ESX Server.

Problem

In order to support virtual machines, the guest OS needed certain modifications and its very hard to influence the design of the guest operating system running within virtual machines. The partitioning normally is done in a static manner and mostly in an adhoc manner such that it poses questions with respect to fairness, performance isolation and efficient utilization of the resources. The VMM is oblivious to the internals of the guest OS and can take certain decisions that can hurt the guest OS performance. Thus there arises a need to provide cleaner, simple mechanisms to support dynamic partitioning of the resources based on the usage where the guest OS too participates in the decision making of resource management. Moreover, individual servers tend to be underutilized and so consolidating them as virtual machines on a individual servers seems to the way to achieve cost reduction.

Contribution

The biggest contribution here is to portrait that dynamic partitioning is easily achievable without compromising the performance and fairness. This idea might be used to fulfill the SLO in a multi-tenant environment both in private as well as cloud based systems. It can be achieved by having meta level mechanisms decide on the target resource allocation and use the lower level mechanism to achieve those targets. The paper propose proportional-share allocation algorithm that can help in target memory allocation for each VM based on its working set estimates. Balloning can be employed as one such lower level mechanism to achieve the targets by ensuring cooperation from guest OS for memory reclamation when memory is scarce.

Another contribution is the concept of memory sharing to conserve memory so that over-commitment can be supported. This same sharing concept can be used for deduplication in the storage layer too for achieving storage efficiency. The sharing is decided based on the content of the pages by scanning for duplicate copies based on a sharing policies of when and where to scan.

Evaluation

The author has tried to provide justification of every single choice by running experiments using synthetic benchmarks and using the real world workloads. However, I feel more evaluation is needed for page sharing that describes the amount of time taken for completely scanning the pages based on various sharing policies. How does the scan behaves as the memory size increases also needs to be evaluated? In the absence of a formal proof to back the memory sampling and idle memory tax, a broader evaluation is required to justify the choice. How does the dynamic reallocation impact the performance is not shown.

Confusion

I couldn't exactly understand the I/O page remapping and how mapping into/from low memory to high memory can solve the addressing issue.

Posted by: Yuvraj | February 3, 2016 11:58 PM

1. summary
The paper discusses all the mechanisms and policies that go into the VMWare ESX Server, which is a VMM similar to Disco. Through these updates since Disco, ESX Server is able to efficiently multiplex hardware resources between unmodified VMs.
2. Problem
The problem was that, at that time, there were only 2 options for computing: (1) either you run a software (VM) per individual server which has almost no performance penalty or (2) you combine the software (VMs) on top of fewer physical servers which makes management easier and utilizes hardware better. The downside of one VM per server is that hardware could be underutilized, and the downside of multiple VMs on fewer physical servers is that performance is degraded to some extend.
3. Contributions
The goal of ESX Server was to provide an ideal system where it is easy to multiplex the hardware between VMs (and as a result utilize it fully) as well as making sure every VM gets the right amount of resources based on their configuration. Here are the mechanisms and policies that sets ESX apart:
1. A shadow page table in the VMM to keep track of guest VA to host PA translations to speed up TLB misses.
2. A balloon module that is loaded in guest OSes to put pressure on guest OS to free pages, and as a result, the module communicates the freed pages to the VMM, which reclaims the underlying physical pages. Furthermore, deflating lets the module give the memory back to the guest OS.
3. The paper introduces transparent page sharing, where the guest OS does not need to be modified to detect identical pages. Instead, pages are selected and tested against each other for based on content and if the contents match, they are reduced to only one copy.
4. Idle memory tax is introduced to reclaim idled pages from clients, in order to allow others to use those pages.
5. The ESX Server defines certain levels of free memory, which are used to trigger certain policies. For example, if only 6% of memory is free, the Server uses ballooning to free up memory.
6. I/O page remapping is introduced to move pages around since DMAs can only use the first 4Gbs of memory as a buffer.
4. Evaluation
The paper evaluated each mechanism separately, including ballooning which only had 1.4%-4.4% overhead while running dbench. For transparent page sharing, the authors showed that while running different VMs 7%-32% of memory was sharable, but they do not really test the CPU overhead of share detection. They also did not test multiple VMs with different OSes for share detection. They tested idle memory tax by having a VM with idle memory running with a VM with memory intensive workload (dbench) and eventually the idle memory from first VM is reclaimed and given to the other VM which increased its throughput by 30%. This was a great way to prove how effective idle memory tax could be. They did not evaluate shadow page table and I/O page remapping to show the performance improvement and the overhead respectively.

5. Confusion
What technique is used today to share read-only pages between VMs?

Posted by: Arman Shanjani | February 3, 2016 11:11 PM

1. Summary
VMware’s ESX server is a Type-1 hypervisor that multiplexes underlying h/w resources. Authors employ pretty smart and quantified memory management techniques like memory-resizing through ballooning, fair and weighted decision to allocate page through idle memory tax metric and content analysis on pages to find similarity and thus achieve an intelligent page sharing, and finally remapping hot pages into low memory. Many of other virtualization techniques are directly taken from Disco such as the pmap structure.

2. Problem
Trends such as scalable multiprocessors, server consolidation, and need or an efficient utilization of the expensive resources to achieve better performance led to re-innovating virtualization. Disco set this trend through smart techniques of page mapping(through 2l-tlb, page sharing, migration and replication) through an intermediate layer-VMM. It still could not efficiently manage memory among the VMs leading to overheads, and contrary to its claim, it had to modify the guest OS.

3. Contributions
Efficient resource(memory) allocation techniques should be fast, fair, flexible and priority-aware. Shadow page tables that keep VM-MN mappings make accesses faster. With ballooning, it provides a flexible and VM-specified way to move pages in and out of its space because not always all the VMs fully-utilize memory, while others fall short. A novel way for transparent page sharing was to inspect random pages’ contents and hash them into a share pool, whereby the processes(VMs?) were saved from having communications to figure out the common pages. They then adopt a share allocation and reclaim policy with a solid sampling-based metric to determine idle memory and thus achieving full utilization, while also being fair. Bunch of other policies like admission control, dynamic reallocation, IO remapping achieve better management, flexibility and reduced overhead respectively.

4. Evaluation
ESX server is a type1 hypervisor, tests both on simulated workloads to test each of the memory management techniques, and also on production VMs. Page sharing was mostly due to redundant data and code, thus hashing contents randomly proves apt, and savings were about 20%(achievement?). Noteworthy improvement in throughput because of reclaiming memory by using the dynamic tax rate calculation. Dynamic reallocation experiments show overcommitting and then reclaiming back by ballooning and eventually sharing increases savings.

5. Comments/Confusion
Many of the figures and their explanations for evaluating the techniques were not clear. Many of the heuristics/policies they employed could have better versions, like selecting hash function, sampling, allocation policy. So this layer would be subject to constant changes for all such optimizations. What is the reasoning behind taking the maximum of the 3 radically different moving averages.

Posted by: Tithy Sahu | February 3, 2016 11:03 PM

1. Summary
The paper describes and evaluates novel techniques of memory resource management in VMware ESX Server.

2. Problem
Earlier Virtual Machine Monitors(VMMs) had problems in their page replacement and memory sharing policies. These VMMs resorted to introducing another level of paging and developed page replacement algorithms on such a hierarchy of pages. These algorithms were required to pinpoint particular pages from a particular VM to evict and since the VMMs do not have fine grained information on the pages like the guest OS, they had to make largely uninformed resource management decisions. And even the cleverest of algorithms could not avoid double paging as OS is unaware of page eviction done by VMM.
The transparent page sharing algorithms used in earlier implementations like Disco required OS changes.

3. Contributions
The motivation of the authors to run commodity OS efficiently on a VMM led to many innovative mechanisms being introduced in the ESX Server. Ballooning was a really elegant solution developed for page reclamation, it eliminated all the side effects caused by interaction of VMM with the operating system's memory management policies. Another breakthrough was the introduction of content-based paging which eliminated the need to modify OS for page sharing and could potentially identify more candidates for page sharing by comparing the page contents. The mechanism of using idle memory tax instead of using a min-funding revocation algorithm was also a great idea as it proved beneficial for active clients with lesser shared memory that inactive clients who are using a lot of shared memory. The I/O page remapping was another contribution which proved effective for reducing i/o latency for 32bit processors with PAE support.

4. Evaluation
The paper tries to justify each of the new policies that are introduced via benchmarks run on an experimental setup or analyzing real-world data whenever possible and the analysis shows significant improvement due to the new policies developed by the authors.

5. Confusion
- Can we talk about the randomized page replacement policy implemented in ESX? I don't see any details about that in the paper.

Posted by: Mihir Shete | February 3, 2016 10:11 PM

1. Summary
This paper talks about VMware's ESX server which is used to multiplex hardware resources among VMs without modifying the operating systems running on it. It also talks about some of the memory management problems. Some of the basic features mentioned in the paper for effective memory utilization are the use of ballooning to reclaim pages, I/O remapping based on hot pages, page sharing(using hash to identify pages with similar content) and using "shares" & "idle tax" to allocate memory. They also justify all their features/policies using results from various experiments.

2. Problem
This paper like disco proposes a software layer to visualize hardwares for VMs. The problem with Disco was that it requires modifications in the guest OS (IRIX kernel sources). Some of the other problems related to memory management mentioned in the paper are memory reclamation (need to revoke memory from other VMs), efficiently finding out common pages (for sharing), providing efficient memory utilization while maintaining performance isolation guarantees and using an efficient allocation policies

3. Contributions
The feature which I found most interesting in this paper was ballooning to reclaim pages. It can reclaim pages while giving the guest OS control (they can make the better decisions) on which page to remove and they can use their own policy. Their idea of sharing the pages based on hash seems interesting, also the chances of getting collisions using the latest hash algorithm are even tougher than during the time of the release of the paper. They used idle memory tax to reclaim pages from VMs not using their pages. I agree with their statistical sampling method but I don't understand why they need 3 moving averages for active memory sampling. Isn't there a more intelligent way to do this? They propose a dynamic reallocation policy with 4 states (low, high, soft, hard) which is similar in principle with the low & high-water mark policy. They control allocation of memory using parameters like min, max and shares which provides some performance isolation guarantees. Finally they talk about identifying hot I/O pages that should be remapped to the location of low address pages.

4. Evaluation
This paper tries to justify all their policy choices with their respective experiments. They justify the low overhead of using ballooning by using dbench benchmark. Then they show the high percentage of page sharing and the high percentage of memory reclamation using their design on SPEC95 benchmark. This graph comes out as expected i.e. the chances of sharing increasing with increasing VMs having more pages in common. I felt for active memory sampling, it would have been better if they compared the performance of their sampling method with some other sampling method in terms of memory utilization. I did not understand the graphs they used for evaluating dynamic reallocation (Fig 8), why did they need both sql and citrix server? the correlation between different lines in these graph seem to vary.

5. Confusion
My main confusions are the use to multiple moving average and selecting the max value among them in active memory sampling, there should be a better method and the graphs related to dynamic reallocation in SQL and citrix server (shouldn't the correlation among the lines be the same in both graphs?). Do I/O pages still need to be moved to lower address space for better performance?

Posted by: Anubhavnidhi "Archie" Abhashkumar | February 3, 2016 09:55 PM

1. Summary
The paper describes Vmware ESX, which is a software layer responsible for multiplexing hardware resources amongst multiple virtual machines. There are several memory reclamation techniques listed in the paper, ranging from “ballooning” to page-replacement and sharing policies. Separate implementations of these techniques are also discussed and evaluated.

2. Problem
In response to underutilization of individual servers, virtual machines are used instead to run different operating systems on a single hardware platform. However, virtual machines need a certain level of resource guarantees to run efficiently, and there must be some sort of mechanism in place for high-level resource management. If memory is overcommited to too many virtual machines, there may be an overlarge memory pressure on the system as a whole.

3. Contributions
The authors borrow Disco's idea of machine address → physical address → virtual address when it comes to address translation. This allows the server to monitor and/or interpose on guest memory. In addition, the authors introduce several techniques:
(1) Ballooning: a module is loaded onto an OS and communicates directly through the server. The server can then “inflate” or “deflate” this balloon by allocating or deallocating pinned physical pages inside the VM. This creates an artificial memory presure inside the guest OS, which can then decide to deal with memory conservation as it sees fit.
(2) Memory sharing: building off of Disco's idea of transparent page sharing, ESX adopts “content-based page sharing” instead. Copy-on-write pages are hashed and stored in a hash table, and other pages that may potentially be shared are evaluated against this table. If a match is found, the redundant copy is recycled back into the system.
(3) Shares & taxes: Clients (VMs) are given a certain number of shares that represent their “resource rights” within the system. When memory needs to be reclaimed, the system starts with the VM that has the fewest number of shares. To counterbalance hoarding processes, there is also the concept of an “idle memory tax”. Pages are statisically sampled and tracked amongst VMs to get an average number of pages touched per unit of time. This way, the system can guesstimate how much memory a VM is actively using, and VMs that use too many idle pages can have a high fraction of them reclaimed by the system.
(4) Admission control: each VM has three parameters: min, max, and shares. Min represents the lowest guarantee of memory that a VM can have, while max is the highest that can be allocated if the system is not overmitted. The amount of “free” memory at any given point can be dynamically reassigned by certain triggers. There are states in memory: high, soft, hard, and low. Starting with the soft state, the system begins to use the memory techniques above to try to transition back into a “high” state.
(5) I/O page remapping: there is the division of “high” and “low” memory for I/O devices. When pages have been copied over a certain threshold of times, the page is remapped into low memory, where it's more easily accessible. However, if too many pages are “low”, there may be a lack of pages, which requires the remapping back into “high”.

4. Evaluation
There are several different evaluations done for page sharing, shares & tax, and dynamic reallocation.
(1) Page sharing: over a homogeneous workload, a max of 67% sharing can be reached. Over a nonhomogeneous workload, anywhere from 7-20% can be freed. One interesting thing though is that their tests all run on the same VMs (all Windows, all Linux, etc.) I'd be interested to see how their system copes with different ones.
(2) The tax rate effectively reallocates pages from the Windows VM to the Linux one, resulting in a performance in dbench (which imrpoves with additional memory).
(3) Except for brief transitions on startup, ESX spends nearly all its time in either high or soft states. Memory is effectively reallocated under this scheme.

5. Confusion
How are shares allocated? Can they be reallocated as part of the tax scheme? Also, there are no allowances made for VM-to-VM communication via shared read-write pages. Is that part of ESX at all?

Posted by: En-Ui Annie Lin | February 3, 2016 09:53 PM

1. summary
ESX Server introduced several concepts to use memory resource efficiently, which are reclaiming methods, sharing using scan, memory allocation method, and hot page remapping.
The pages are reclaimed by ballooning and paging. Ballooning rearranges each VM’s memory density dynamically with inflate and deflate, which are to reducing and increasing memory each other. If the ballooning is not practical in use, the paging takes place to reclaim the pages from each virtual machines forcibly.
The pages can be shared across all the virtual machines with scan method. The pages marked as read-only or codes are checked across virtual machines’ pages and then if they are matched the matched pages are reclaimed. The scan operation is executed with hash key so that the burden of check sequence is reduced.
The memory is reclaimed with idle memory tax which is the access counts on sampled pages. The access counts are the basis to judge which VM has less workload. There are three sampling methods, which are slow, fast and max.
There are four reclamation states: high, soft, hard and low. The soft, and hard, and low state use ballooning, and paging, and paging with blocking execution of all VMs each other. The hot page move the pages placed in high memory address out of virtual address area into another pages placed in low memory address to prevent duplicate copy operation.

2. Problem
Limited resource, especially memory, is the bottleneck to run many virtual machines on system because each VMs should share the resources evenly or efficiently. Therefore, ESX forces pages to be moved to another VMs using meta-level replacement policy, while determining which pages or data is valuable is difficult because its information reside in each guest OS. Kicking a page out forcibly from page tables in a VM can cause an side effect, a double paging, when guest OS uses the same page right after the page is chosen to be reclaimed.

3. Contributions
ESX server focuses on the memory management policy across VMs. Especially, idle memory tax and counting the access of sampled page is great example of checking whether VM is current in working or not. After checking which VM’s workload, in my guess ESX uses the ballooning or paging to reclaim the pages from less-working VM.
Changing memory capacity dynamically is the key point in this paper because the memory resource reallocation is impossible without its support. In terms of memory resource reallocation, if the system supports hard pluggable memory and finds its usage model, it will be big contribution.
Ballooning is one of the key points in this paper because ballooning is soft method to move pages between VMs. The pages are moved the instructions of inflate and deflate.

4. Evaluation
The content based sharing showed that the sharing memory is increasing with the increase in VMs and the sharing level also improved to 67% as the number of VMs increases. CPU overhead in enabled page sharing is almost identical with disabled page sharing and the memory is reclaimed as amount of 4.2% to 32.9%.
Depending on the memory sampling results, the memory is allocated or deallocated, boosting the performance by over 30% in the experiment in the paper.

5. Confusion
The pmap in Disco uses for reverse trace from physical to virtual address while pmap in ESX provides the mapping from physical to machine address?
What is the bcopy in Disco?
Is it possible to make hard pluggable memory?

Posted by: Choungki Song | February 3, 2016 07:34 PM

CS 736 Reviews - Spring 2016

Memory Resource Management in VMware ESX Server

Comments

Post a comment