« Virtual memory, processes, and sharing in MULTICS | Main | Implementing Remote Procedure Calls »

Memory Resource Management in VMware ESX Server

Carl Waldspurger. Memory Resource Management in VMware ESX Server in Proceedings of the 5th Symposium on Operating Systems Design and Implementation, 2002.

Reviews due Tuesday, 2/24.

Comments

1. summary
This paper "Memory Resource Management in VMware ESX Server", introduces the various mechanisms and policies that were used to manage memory in ESX server. ESX Server is a thin software layer, which virtulizes physical hardware so that unmodified commodity OS can be run on VM. Several novel techniques like ballooning, content based page sharing, idle memory tax for reclaiming idle memory, I/O Page remapping and dynamic reallocation technique were introduced and a good evaluation is also presented showing significant performance improvements.
2. Problem
Previous VMMs like Disco also virtualized hardware for running commodity OS on VM. But those required modifications in guest OS to be run on VM. Also memory management between VM was not effective, as they used simple proportional sharing for memory management, which had problems like double paging memory wastage by idle VMs. ESX server tries to address these issues.
3. Contributions
1. Ballooning - A technique by which memory can be reclaimed by ESX server from the guest OS. This is done by calling the guest OS memory management routines to create pinned pages in the physical memory. These pinned pages are then reclaimed by ESX server. This idea follows the notion that, the page to be evicted is best known by the guest operating system.
2. Content Based Transparent Page sharing - Unlike disco which required specific interfaces for creating shared pages. ESX server finds similar pages between VMs by scanning them and using a hash map to compare them quickly. A background task which doesnt cause much overhead actively scans and creates hash values which is then used to find similar pages using the hash map. If a match is found they are marked COW and duplicate memory is reclaimed.
3. Idle memory tax - This solves the open problem in share based resource management. The basic idea is to charge a VM more for idle page than one it actively uses. Ballooning technique above is used to reclaim the idle pages.
4. I/O Page remapping is also leveraged to reduce I/O copying overheads in large-memory systems.
4. Evaluation
This paper provides precise evaluation of the various techniques presented in the paper. At various points the performance gain is compared over Disco. Some important results were, Idle memory taxing improves the overall throughput. Content based memory sharing frees up memory which grows linearly with the number of VMs. Ballooning adds a very small overhead (1.4 - 4.4%) while the performance of ballooned VMs were almost similar unballooned VMs of same memory.
5. Confusion
What are Zero Pages ? where and why they are used ? what is their significance ?

Summary
The paper introduces novel techniques to manage virtual memory efficiently among guest OSes on VMWare ESX Server. The major difference seen in ESX server with respect to VMMs is that ESX tries to manage the system hardware directly without intercepting the host OS. This paper also presents a terse evaluation for the all new techniques demonstrating their correctness and efficiency.

Problem
The main objective of this paper is to multiplex the hardware and utilize the memory available and thus allow for sharing server workloads evenly across all VMs. The author neatly identifies 3 key problems in memory management for ESX server and introduces a new algorithm to avoid each of those.
a) Memory Reclamation – ESX should be able to handle overcommitted memory on a VM. It should be able to revoke memory from other VMs if available.
b) Sharing Memory – Running same OSs across VMs may lead to redundant code and date in the memory. Sharing will reduce this redundancy making the memory available for other purposes.
c) Memory allocation – Allocate memory across VMs using a fair share policy.

Contribution
The author describes unique ways to tackle the above mentioned problems for managing virtualized memory.
a) Ballooning – Traditional meta-level page replacement algorithms for reclaiming memory need some kinds of information from guest OS. The new technique called Ballooning achieves this. A balloon (like a driver) works on pinned physical pages in VM. Inflating balloon claims memory and deflating the balloon releases allocated pages. The server ‘coaxes’ the guest OS into releasing memory.
b) Content based page sharing – ESX compares page content in order to share memory. This is again requires no modification to the commodity OS. Hashing is used to identify pages uniquely. A hash value summarizes the content of the page and if a page with same hash is found, full comparison is done to verify the contents are same. Such pages are marked as Copy-on-write (COW) in hash table that prevents multiple copies of same page unless required.
c) Idle memory tax – When memory is scarce, VMs or processes with abundant memory will be penalized compared to ones who are under memory pressure. A measure of active vs idle pages is obtained using statistical sampling with minimum overhead.
High level allocation policy – ESX uses threshold to dynamically allocate memory to VMs (high, low, soft, hard). ESX blocks a VM when levels are low. ESX also allows I/O page remapping and can map hot pages in case required.

Evaluation
The evaluation of all the novel techniques is done clearly with at least one example from each. For ballooning, a linux VM running dbench with 40 clients showed almost similar performance with ballooning as compared to a server with no ballooning. For evaluating content based sharing,, 3 instances of real world guest Oss were run and page sharing mechanism could reclaim more one-third of VM. To check the efficiency of idle memory tax, percentage share was observed on 2 VMs by varying the tax rate.

Confusion
I found this paper was very well written with each concept explained in good detail with examples. I would want to understand the technique of ballooning in further detail. To me it was not as clear as content-based-sharing or taxing.

1.Summary
This paper discusses about memory management mechanisms and policies implemented in ESX server, a thin type-1 hypervisor to efficiently support server consolidation and virtualization. Techniques like ballooning and paging reclaims memory to support memory overcommitment of unmodified guest OSes. Content based page sharing, hot I/O page remapping and higher level allocation policies are discussed. ESX server’s performance and memory utilization is tested for variety of workloads and guest OSes and the detailed results back their claims.

2.Problem
Server consolidation and virtualization is achieved by VMMs running multiple guest OSes supporting memory overcommitment. But, this can be problematic for VMMs goal of providing performance and resource isolation along with efficient memory utilization among VMs. Many VMM modify the guest OSes or rely on traditional paging mechanisms to support memory virtualization, which has lot of overheads. In this paper, author has tried to address this problem of managing system memory in a way that would still provide essential isolation by satisfying dynamic memory needs of each VM.

3. Contributions
The biggest contribution of the paper is ballooning technique, where a pseudo driver is responsible for reclaiming memory from VM, implicitly making guest OS use its native memory management techniques when memory pressure is created. When ballooning is not possible, VMM switches back to demand paging technique. Content based page sharing is an efficient way of utilizing underlying bit level details for identifying same content pages and sharing them. Hashing technique implemented in VMM makes sure that guest OS need to be modified to support this. Idle memory tax technique is one more noteworthy contribution to reclaim idle memory among the VMs even though they have a share-based allocation, and thus still support performance isolation. Working set algorithm keeps track of memory necessity of each VM, thus managing system wide memory efficiently. Higher-level policies takes into account the administrator allocation parameters and system load, and accordingly reclaims memory to keep each VMs memory need to a higher threshold. Hot I/O page remapping allows network memory to be scaled along with system memory. All the techniques have been tested for different scenarios but further research in the direction of scalable NUMA servers is needed to back these contributions.

4.Evaluation
The concepts in the paper have been backed by detailed evaluation and empirical data. Ballooning technique’s memory reclamation throughout is compared with VMs actual memory and the overhead is not more than 4.4%. Content based page sharing across 10 VMs is shown and upto 67% of VM memory is shared among which 60% is reclaimed. However, overhead of accessing and updating global hash table for each page scan should have been evaluated. Memory sampling technique results estimating active working set using statistical estimates along with the example of idle memory tax usage is explained. Dynamic allocation policies for varying system load and allocation parameters have been evaluated across 4 different VMs and detailed experimental results have been given.

5.Confusion
What is the concept of chaining in hast table (a link chain in shared frame hash entry)? In Figure 4, I did not completely get what is Shared-Reclaimed memory? Some discussion on Figure 8 would be helpful (Why is ballooning low when SQL server starts querying again taking memory more than min size?)

1. Summary

The VMware ESX Server hypervisor implements a combination of techniques to allow efficient use of memory in overcommitted situations. In particular, the use of balloon drivers to implement page reclamation, Transparent page sharing via hashing, and an idle memory tax to make intelligent reclamation decisions. All of these are implement such that it is possible to run unmodified guest operating systems without any performance issues.

2. Problem

When a hypervisor runs unmodified operating systems, it lacks any specific knowledge of the page utilization of a guest operating system. As such it is difficult to implement intelligent paging algorithms.

3. Contributions

The most prominent thing introduced by this paper is the balloon drivers. These allow the hypervisor to make intelligent decisions about which pages to revoke without requiring modification of the operating system itself. Instead, they use a driver who's sole purpose is to attempt to reclaim memory from the guest operating system. To do this, when requested by the hypervisor, it attempts to allocate some guest physical memory from the guest operating system. These pages are then reclaimed by the hypervisor.

Second is their implementation of page sharing. Again since they are running unmodified operating systems, they cannot directly obtain data on which pages may be shared. Instead, they compare the pages indirectly via hashes to determine candidates for page sharing, and to index shared pages. The hash itself is only use for the initial comparison, and is stored allow with other data in a hash table for efficient lookup. Before actually sharing pages, the full contents of the page are compared.

Finally, to answer the question of who's pages to remove when in overcommit situations, they implement an idle memory tax on top of a proportional share system. To detect how many pages are idle, from each VM they select a random set of guest physical pages, and watch their activity. Then based on these pages, they determine how many pages are active, via an algorithm that allows them to respond quickly on an increase in usage, and lag actual usage on a decrease. Then given a tunable percentage, they allow the hypervisor to reclaim up to that percent of the idle memory of a VM when needed.

4. Evaluation

For the balloon driver, they show that it has low overhead, by comparing a VM which has been ballooned to a particular physical memory size with a VM which was statically configured to such a memory size. And this is certainly important, but it may have been also useful to see how it affects performance when repeatedly scaling the

For page sharing, they showed that when running identical Linux instances, As expected, after the first VM, roughly 60% of the pages in any given VM were shared. Even in the case of one VM, nearly 20% of the pages were shared, half of which came from shared zero pages. Furthermore, they show some data obtained from a real world system, showing that in at least some cases, there is significant sharing, although the values are greatly reduced from the ideal system, the amounts shared are still large. It may have been useful to also consider the overhead in both memory usage and CPU usage in sharing, in particular in diverse systems where the sharing percentage is low.

For the idle memory tax, they first validate that their page usage detection works by running a program that repeatedly touches a number of pages. Then by varying over time the number of pages touched, the validate that they can reliably detect how many pages are used. Next they validate the effectiveness of the idle memory tax by first setting the tax to 0 on a system with one VM running a memory intensive task, and one VM remaining idle. With an idle memory tax of 0, this is entirely equivalent to a proportional share system, and the memory percentages for each VM reflects this, Then half way though, they change the idle memory tax to 75%, and nearly instantaneously, the VM running the memory intensive task is given a much larger share of the memory.

5. Confusion

Why have the balloon driver poll the number of pages to allocate, is it not possible to have the hypervisor somehow notify the balloon driver of a change?

Summary:

This paper describes how the policies in a VM memory management system could be designed to improve the efficiency of memory use without having any influence over guest Operating systems. This paper describes the memory management system implemented in VMware's ESX Server, which runs as a hypervisor and directly handles resource management.

Problem:

The big issue that this paper tries to solve is resource management for a VM which has no direct influence of the guest operating systems. While this paper focuses on memory management aspects of resource control, this issue us still relevant and important for software VMs.

Contributions:

The main contributions of this paper are the mechanism that they implement to achieve better memory utilization. A pseudo device driver was used to implement a memory balloon that could vary the memory pressure on the guest OS. This allows the VMM to use the guest OS's page replacement policies to determine which pages could be reallocated.

A mechanism which allows memory to be shared between VMs that uses a two-level memory comparison avoids having multiple copies of the same data in the system. The benefits of this were demonstrated even for VMs running disparate applications.

A ticket based memory allocation policy that tries to reduce the amount of idle memory by taxing systems with poor utilization is another contribution of this paper. They implement a sampling based method to keep track of the actual memory utilization as well.

Evaluation:

This paper presents a number of well thought out experiments that demonstrate the effectiveness of the high level policies of ESX Server as well as the low level mechanisms that implement these policies. These results show that the ESX server memory management system is capable of handling at least a 60% over commitment with only a very small amount of memory needing to be paged out.

Confusion:

This comes up quite a bit in the paper but what is the point of a zero page and why does Windows set all physical memory to zero as part of its boot process?

Summary:
This article talks about VMware ESX Server which is a thin layer designed to multiplex underlying hardware resources efficiently among virtual machines running stock commodity operating systems. It talks specifically about memory management and how memory is allocated and reclaimed when administrators have overcommitted memory for the VMs.

Problem:
There problems addressed in this paper are efficient reclamation of memory, an innovative content-based page sharing algorithm, reclamation of idle memory, allocation policies among VMs and I/O page remapping

Contributions:
The major topics discussed in this paper are as follows:

Ballooning - A balloon diver is installed in the guest kernels. This driver occupies pinned physical pages when instructed thus resulting in memory demand in the guest. The guest reclaims memory when the balloon is inflated. The monitor can then reclaim the physical pages. To release memory the balloon can be deflated

Content-Based Page Sharing - Physical pages in machine memory are shared based on their raw contents. This is done using hashing. Pages are usually picked up randomly and their contents are hashed and matched with the already hashed table. If a match is found the pages are marked copy-on-write.

Idle memory tax - The server, while doing proportional share based memory allocation also imposes an idle memory tax resulting in min-funding renovation extended to use and adjusted shares-per-page ratio. The idea is to charge a VM more for an idle page than one which i actively using it. This is done using a statistical sampling of pages of a VM to find out the fraction f of memory actively accessed by the VM

Allocation Policies - Parameters such as the min, max and memory shares can be configured by the administrator. Admission control is used to either allow or not allow a VM if min + overhead (due to the frame buffer and book keeping memory) amount of memory is available. Dynamic reallocation of memory is constantly happening depending on various states of occupancy of memory. Especially if memory occupancy is very high, paging is resorted to to reclaim memory and overcommitted VMs are stopped momentarily till more memory is available.

I/O Page Remapping - This is done such that device drivers can access memory for DMA. Pages are remapped to lower addressed in this case.

Evaluation:
Ballooning performance measurements show that the overhead due to ballooning is only from 4.4% down to 1.4%. Page content sharing results in almost 67% sharing level when the same operating system is run on many instances. The authors also provide extensive graphs and evaluation data for the cases of idle memory reclamation and dynamic memory reallocation.

Confusions:
Today, we have servers with terabytes of memory. How effective is the hashing technique for page content based sharing toda

1. Summary
This paper presents the memory resource management techniques used in VMware ESX virtual machine monitor. It aims to enable sharing of resources like memory across virtual machines with no changes to the guest OS thereby increasing performance while maintaining isolation.

2. Problem
Virtual machines aim at efficient use of the resources while maintaining the needed isolation between guests. This becomes profound with memory, because the amount of memory available precedes the memory requirement of all the instances. The objective of ESX was to solve these problems without requiring any modifications to the guest operating systems. It allows overcommitting of memory by increasing/shrinking amount of memory dedicated to each VM.

3. Contributions
It allows the sharing of memory across virtual machines with requiring any modifications to the guest operating systems. OSes don't have a facility for changing amount of physical memory at runtime without requiring any changes in the OS. This is achieved by adding "balloon" device drivers which allows machines to overallocate the total memory between many machines. Thus it introduces some standards without requiring any specific change for each guest operating systems. VMM tells device driver to inflate balloon to get OS to free (page out) memory.
The same data is mapped to the same physical memory location across virtual machines, achieving sharing while also effectively reducing the total amount of memory needed while preserving the speed of access to the common memory for all parties.
A method was developed for measurement of idle memory on a virtual machine and then employing certain techniques to reclaim it from the idle instance and reallocate that memory to another instance where it may be better used.

4. Evaluation
In general, the goals of performance isolation and efficient memory utilization often conflict with each other. But the overheads with the ESX system is negligible. The experiments conducted by the authors also suggest that ~2/3 of memory is reclaimed from idle pages by mapping to shared. The results of dynamic reallocation of memory among the VMs show the ESX memory management is capable of handling overcommitment of memory to each efficiently.

5. Confusion
Is the method of sharing memory better in DISCO in any other way except for requiring changes in the guest OS?

Summary:
This paper talks about the important features implemented in VMWare's ESX Server, for memory management.

Problem:
The problems whose solutions are discussed in this paper, range from efficient page reclamation from virtual machines,and efficient page sharing to increasing memory utilization.

Contributions:
The important ideas in the paper are:
- Ballooning to reclaim memory from guest OS: A balloon driver inside the guest OS pins physical pages to increase memory pressure on the guest OS ( and deallocates them to remove the pressure), thus forcing it to pick the best pages to page out into the disk.
- Content based sharing: When the hash value of a scanned page matches with one already in memory, the esx server will perform a full comparison and if the pages have the same data, then the page in memory is just marked COW (copy on write) and shared among the virtual machines.
- Memory allocation is partly done in a 'share' based manner: i.e. each client can consume resources proportional to its 'shares'. So when reclaiming pages, the VM with the minimum shares per page is chosen.
- Idle memory tax: Is used to adjust the share value of virtual machines which hoard a lot of inactive pages in main memory. To compute the idle memory tax for each vm, we get the fraction of inactive pages in memory through statistical sampling. This serves as a method for estimating the working set model of memory allocation and so the ESX uses a combination of working set and share-based memory allocation/reclamation policies.
-I/O remapping: to map pages with lots of i/o directly into the lower machine addresses so that i/o devices can access those locations directly during DMA.

Evaluation:
The authors provide a lot of empirical data to prove the effectiveness of their mechanisms such as the reduced overhead of ballooning, the increasing ( nearly 67%) amount of memory shared due to the Content based sharing mechanism, and the effectiveness of idle memory reclamation(through idle memory tax).

Confusions:
How exactly does the communication between the esx server and the balloon driver happen?

Summary

This paper introduces several mechanisms and policies for managing memory in the VMware ESX Server, which is a non-hosted virtual machine monitor. It is a thin software layer that manages system hardware directly and provides a virtual machine interface to commodity OSs. It allows memory overcommitment for each VM and hence high-level allocation policies that compute target memory allocation for each VM based on both its share-based entitlement and an estimate of working set are required. These policies are achieved via the ballooning and paging mechanisms. Additionally, page sharing within and between VMs reduces overall memory pressure on the system.

Problem

Consolidating individual servers and running them on a single physical server improves utilization of hardware resources. In order to gain from statistical multiplexing, an example is to overcommit memory in each virtual machine while still providing a minimum guarantee. In order to achieve this, policies and mechanisms are needed to effectively manage the physical memory.

Contributions

A novel page reclamation mechanism, ballooning, which is a pseudo-device driver that increases or decreases memory pressure in the guest OS, causing the guest OS to invoke its own native memory management algorithms for reclamation.

Content-Based page sharing via the use of a hash-table to find matching pages and using copy-on-write.

Idle memory tax is the idea to charge a VM more for an idle page than for one it is actively using. Through this both goals of performance isolation and efficient memory utilization can be simultaneously achieved.

Lastly, high-level policies for allocation of memory to each VM by using the above mechanisms have been discussed.

Evaluation

Each mechanism and the high-level dynamic allocation policy have been separately evaluated in the paper. Ballooned VM performance is almost similar to non-Ballooned VM performance, with only 1.4 % to 4.4% overhead for ballooned. In page sharing, they find that there is overlap of approximately two-thirds of all memory between the VMs. The CPU overhead due to page sharing was negligible. The results of the evaluation for idle memory tax shows its effectiveness by increasing throughput by over 30 % after imposing an idle tax. They also found satisfactory performance of the dynamic allocation policy.

Confusion

The Guest OS is not aware of the fact that it is running on a virtual machine, then how is the guest OS instructed that the balloon driver should be invoked?

1. Summary
This paper introduces several memory management mechanism and policies for VMware ESX server, including ballooning, content-based page sharing, idle memory index, dynamic allocation and I/O page remapping.

2. Problem
Each technique is corresponding to an existing problem. Ballooning solve the problem that traditional meta-level paging replacement policy is not aware of page usage in guest OS. Content-based page sharing avoids guest OS modification to support transparent page sharing in Disco. Share-based allocation and idle memory tax aims to allocate resources according to the importance and usage of guest OS. I/O remapping makes access “hot” pages more efficient.

3. Contributions
The ballooning technique is to place a driver called balloon into each guest OS. When server memory is overcommitted, server notifies the balloon to inflate and guest OS would automatically release pages according to its replacement policy. Server can reclaim memory from guest OSes in this way. The reverse also holds when balloon deflates it releases free pages to guest OS.
Content-based sharing is to identify each page by a hash value calculated from content of page. Identical pages having same hash value could be mapped to the same machine page marked with copy-on-write. Initial copy of a page is a “hint” and will only copy-on-write when there is a match of “hint” page and its hash value has changed.
Share-based allocation is to allocate memory pages among clients such as processors, I/O devices. The share value of a page indicates references from clients. The page with fewest shares may be revoked from and allocated to memory intensive clients.
Idle memory tax is to charge when a client underutilizes its pages. Idle pages from these clients are revoked and allocated to those with few idle pages. The idleness/activeness of a page is dynamically measured using sampling.
Several metrics, such as min, max, high, soft, hard, low, shares are measured to provide support for dynamic memory allocation and implementations for all previous mechanisms.
I/O page remapping is to remap frequently accessed pages in high memory to low memory. This is done by counting the number of references to a page, and once over a threshold, we change the PPN-to-MPN mapping. This enables faster access to high memory that exceeds 32-bit address length.

4. Evaluation
This paper provides several evaluations to claim the feasibility of their mechanism and policy. For each result, detailed explanations are provided. Balloon mechanism has an overhead of 1.4%-4.4% and most are due to larger guest OS structure in larger memory settings.
Content-based shared memory observes a proportional share and reclamation pattern. The fraction of share and reclamation is workload-dependent. The memory allocation according to idle page sampling shows rapid increase and slow decrease with growing and shrinking of active pages. Higher idle memory tax rate would deprive more pages from idle client and allocate them to busy client, which results in higher throughput.

5. Confusion
I am confused about the adjusted share-par-page ratio formula. Intuitively, I don’t understand why the formula comes in this form and why this form may help identifying shares of a page. Also, in Section 5.3 I’m not clear why n is randomly chosen instead of a fixed parameter.

1. Summary
The paper presents a range of memory management mechanisms employed in VMware ESX server, a commercial virtual machine monitor. The paper describes an easily configurable memory management policy mechanism for administrators, and how ESX server effectively allocates memory to VMs based on the configured policy and memory load of VMs.

2. Problem
To effectively utilize and allocate memory to virtual machines in a virtual machine monitor by overcommiting memory, thereby, reaping the benefits of statistical multiplexing.

3. Contributions
The ballooning technique provides an ingenious low-level mechanism for reclaiming memory from virtual machines by using standard OS interfaces to allocate and pin dummy physical pages effectively removing them from the purview of OS and applications running in the virtual machines. Content-based sharing of memory pages across VMs using copy-on-write technique provides an effective mechanism for eliminating redundancy in memory usage. The sampling technique of invalidating and tracking a random subset of pages provides a mechanism to estimate the working set of virtual machines at the cost of a small amount of additional page faults. The estimated working set of virtual machines in conjunction with idle memory taxing mechanism provides an effective mechanism for reclaiming idle memory from virtual machines.

4. Evaluation
The authors show that the ballooning technique introduces minimal additional overhead by comparing the performance of synthetic fileserver benchmark. The authors show that page sharing technique provides reasonable gains in both synthetic and real world setting. The authors show that the working set estimation technique tracks a synthetic application at the time scale of minutes. It is not very clear how these dynamic reallocation measures affect the performance of real world applications, specifically latency critical applications.

5. Confusion
The authors, in the page sharing technique, ignore collisions in hashes of pages claiming the probability of collision is very low (0.01%). However, this does not seem very low. If hash collisions do occur, they could adversely affect VMs. Moreover, malicious VMs could construct hash collision attacks to deny service to other VMs.

Summary:

This paper introduces the ESX Server which is a layer of software above the hardware to multiplex hardware resources. The ESX Server introduces mechanisms and policies pertaining to efficient memory management without changes to the guest OSes. The paper discusses in detail about the concept, implementation and evaluation of ballooning, content based sharing, active memory sampling, idle memory tax and dynamic reallocation repectively.

Problem:

The paper introduces mechanisms and policies to manage memory. The ESX Server is designed to manage the hardware resources directly instead of trapping into an underlying OS. Another problem is to run the guest OS without any modification on top of the VMM. The paper address the problem to reclaim memory from a VM when there is memory need in another VM. Also, a mechanism to share pages across VMs without guest OS modification is discussed.

Contribution:

The ESX Server tries to obtain maximum utilization of the server resources by overcommitting the resources. The primary discussion is about the memory management techniques. To reclaim memory from a VM that does not fully utilize its allocated memory, a balloon module is installed in each guest OS which is inflated to increase memory in guest OS and corresponding pages allocated are claimed by ESX Server. When ballooning fails, randomized page replacement is resorted. Content based sharing mechanism is used where pages with identical content is shared. The novelty lies in hashing the page and then comparing the contents only when the hash value is similar. By charging more for an idle page than an active page performance isolation and memory utilization is achieved. To effectively measure idle memory statistical sampling approach is used. By reserving disk swap space VM memory is preserved. Using different reclamation states of high, soft, hard and low memory is effectively reclaimed using ballooning and paging mechanisms. By keeping track of hot pages involved in repeated I/O the number of pages copied is reduced for intensive workloads.

Evaluation:

The proposed system has been tested across different scenarios. Using dbench on ESX Server with 40 VMs shows that the ballooning does not have significant overhead on the performance though some overhead due to data structures corresponding to the physical memory size is felt. The effectiveness of the content based sharing mechanism is demonstrated when sharing level approached 67% with large number of VMs. Only the best case workload was analyzed using homogeneous VMs. The performance with heterogeneous VMs would have been more interesting. The idle memory tax mechanism also produces satisfying results as the throughput increases by 30% after a tax change on idle memory.

Confused about:

I am interested to know how ballooning is implemented. How does the ESX Server handle the case when there is a memory pressure across all the VMs with high level of sharing?

Summary:
The paper describes mechanism and policies for implementing efficient memory management on the VMware ESX server when running unmodified Operating Systems. Primary techniques described in this paper are ballooning, idle memory taxing, and reclaiming memory which are used to over-commit memory when required and achieve virtualization.

Problem:
How to improve server utilization when servers are running unmodified operating systems? Since modifying the operating systems involves great overhead in time and costs, how can the compute power of servers be used by providing virtualization. The paper describes research results to provide system administrators more flexibility for over-committing resources when running a virtual machine.


Contribution:

Ballooning: This technique is the key to over-commit memory by increasing or decreasing the amount of memory for each VM dynamically. In this technique, the virtual machine monitor inflates a pseudo balloon by asking the device driver to allocate pinned memory.

Content Based page sharing: In this technique using hashing, identical pages are shared across multiple VMs by using a hash table to identify and eliminate copies. Once a copy has been found, it will be marked copy on write and combine into shared page.

Idle memory tax helps in reclaiming pages from the VMs which have the most free pages thus getting system wide proportional sharing policy. Each VM is sampled at regular time intervals to know which VMs have most idle pages.


Evaluation:
Each policy described by the authors have been substantiated with detailed evaluation results. For ballooning, the throughput of dbench is shown to be equivalent with different memory sizes with and without ballooning. For content based page sharing, the authors evaluate with a mix of real world and synthetic workloads. For synthetic workloads, sharing was able to reclaim about 60% of the memory. For real workloads sharing enabled reclaiming 7 - 33% of duplicated memory.


Confusion:
Discussion on overhead in content based page sharing. Also in ballooning, how is the psuedo device driver plugged in to the OS without modifying OS code.

1. Summary
This paper introduces mechanisms and policies in ESX Server that are used to manage memory and evaluates its performance on these topics.

2. Problems
There are previous problems that this paper tries to solve such as 1. To deal with overcommitted memory, previous approach is to introduce another level of paging, which needs a meta-level page replacement policy. As the correct information about pages is known by the guest OS, thus makes the policy sophisticated. The transparency of paging to the guest OS also results in a double paging problem. 2. On memory sharing, transparent page sharing requires several guest OS modifications to identify redundant copies, some sharing also requires the use of non-standard or restricted interfaces. 3. The conflict of memory using between idle and active clients. 4. The limitation of the amount of memory that can be addressed, though hardware support is provided, copying can impose significant overhead. Problems also include the difficulty to modify guest operating systems.

3. Contribution
On memory virtualization, ESX server provides the virtualization that each VM has a zero-based physical address. It maintains a pmap data structure for each VM to translate PPNs to MPNs. On reclamation mechanisms, ESX Server supports overcommitment of memory to facilitate a higher degree of server consolidation than would be possible. To deal with the problem that previous approaches have, ESX Server uses the Ballooning mechanism. When the server wants to reclaim memory, it “inflates” a balloon and deflating the balloon frees up memory for general use within the guest OS. On sharing memory, VMs on ESX Server consume less memory and higher levels of overcommitment are supported efficiently. Pages are identified by their contents, which eliminates the need to modify and easier for sharing. On reclaiming idle memory, ESX Server resolves the problem with “idle memory tax”. Using statistical sampling approach to obtain aggregate VM working set estimates. On allocation, ESX Server ensures sufficient unreserved memory and server swap space is available before a VM is allowed to power on, allocates memory dynamically according to various events. On I/O remapping, using statistics to decide which page to remap into low memory.

4. Evaluation
The paper has separated evaluation session on each aspects. The effectiveness of ballooning is evaluated by the synthetic bench benchmark. The result shows that ballooning has only slight overhead. Besides, it does have limitations such as uninstalled or disabled while guest OS is booting; The performance of memory sharing is evaluated by a series of identically-configured virtual machines running SPEC95 benchmarks, the results show that the amounts of memory is shared increases smoothly with the number of concurrent VMs. To test the performance of memory sampling, the estimates basically follows the whole condition of memory usage with some incorrectness due to the Windows “zero page thread” and increases the throughput.

5. Confusions
Not very clear with the “hint page” details.

Summary
The paper discusses the core mechanisms and policies used to manage memory resources in VMWare ESX Server. It describes in detail the memory reclamation mechanisms(Balooning,page replacement); memory sharing mechanisms( transparent, content based page sharing); memory allocation algorithms(share based allocation) and allocation policies.
Problem
There is a need for consolidation of servers, so as to improve the memory utilization, simplify management and reduce costs. Hypervisors with memory over-commit multiplex the hardware resources efficiently among virtual machines. There is also a need for hypervisors on which the existing operating systems can run unmodified.
Contributions
A novel technique called 'balooning' is used to reclaim pages. Balloon module is loaded as a pseudo driver on the guestOS and it pins pages in the physical memory(inflate) and thereby forces the guestOS to comply with VMware's memory management decision using the guestOS's own management mechanisms. The technique for sharing memory between VM's relies on checking the content of pages on multiple VM's and allowing sharing of pages with identical contents. In order to achieve this, hashing of page contents is used to identify possible identical pages which are then subject to a full comparison. Once a match is found, copy-on-write technique is used to share pages. If dynamic page revocation algorithms based on shares-per-page are used, idle clients with many shares can hoard memory unproductively, while active clients with few shares will suffer. The idea of 'idle memory tax' resolves this problem by charging a client more for idle page and thereby achieves efficient memory utilization while maintaining performance isolation guarantees.
Evaluation
The authors have done a in-depth evaluation of all the mechanisms and policies on a number of guest operating systems. The effectiveness of ballooning technique was demonstrated by running a dbench workload throughput on the VM. The percentage of shared VM memory increased with increase in number of VM's thereby demonstrating the benefits of page sharing mechanism. The effect of idle memory tax was also evaluated.
Confusion
What page reclamation technique will ESX follow if only a few guestOS's uninstalled the balloon driver while others do not.? Will ESX use a combination of balloning and other page reclamation techniques ? Also i did not completely understand the I/O remapping.

Summary:
Author describes various mechanism and policies for sharing and multiplexing memory efficiently among virtual machines running unmodified commodity operating systems. Various novel mechanisms were introduced like: ballooning to reclaim pages, idle memory tax to achieve efficient memory utilization and content page sharing to eliminate redundancy.

Problem:
VMM is unaware of least valuable pages in guest operating system. It may use sophisticated policies for page replacement but they are not effective in case of diverse operating system running on VMM. Double paging problem cannot be avoided in VMM earlier to ESX. Prior to VMware ESX, VMM like DISCO used para-virtualization for effective page sharing. There were no explicit interfaces for to detect memory idleness in earlier VMM. Policies like LRU tracking were used to detect idle memory but they failed to flexibly combine idleness with priority. In case of I/O remapping, if many guest operating system are running, it is not possible to map all guests low physical memory to low machine memory.

Contribution:
1. Cooperating balloon driver can be loaded in guest operating system. When server wants to reclaim memory from particular VM, it instructs driver to allocate pinned physical memory in guest OS which causes prior owner to free it. When pressure reduces VMM can ask driver to free pinned memory.
2. Without modifying guest OS, efficient page sharing mechanism was introduced. By calculating hash values for each page based on content, many physical pages were remapped to single machine page.
3. Using idle memory tax system, guest OS which doesn't consumes all memory allocated is first one to reclaim memory in case of memory scarcity.
4. If guest operating system frequently uses a high machine page for device communication, VMM can transparently map it to low machine memory thereby reducing copying overhead.

Evaluation:
Evaluation is done for each newly introduced feature. Author compares performance with and without ballooning technique and shows in case of low memory overhead is about 4.4% which decreases substantially as VM memory size is increased. Author shows with large number of VM's running in parallel, around 67% of content is shared. It performs experiments on production deployment of ESX server and shows page sharing was able to recliam memory from 7% to 18.7%. With idle memory taxing, throughput is increased by 30%.

Confusion:
Does balloon driver frees shared page? In order to avoid freeing shared page how balloon driver knows which pages are shared by VMM?

1. Summary
This paper describes VMWare ESX server, a hypervisor designed to share hardware resources among virtual machines. It uses several techniques to reduce inefficient memory utilization and overheads while still remaining transparent to the guest OS.
2. Problem
To provide transparent virtualized memory to guests on an overprovisioned system, hypervisors add a layer of paging indirection allowing them to swap rarely used virtual machine memory to disk. This can cause conflict with the guest OS's paging policy though, and lead to an extra overhead in faults if done poorly.
3. Contributions
A native ballooning driver was introduced to run on the guest OS, which can either allocate memory to itself or deallocate it, increasing and decreasing memory pressure on the guest. This has the effect of causing pages to be swapped to disk, decreasing the amount of physical memory used by the guest.
To facilitate efficient sharing of physical memory resources, hashing is used to identify pages that can be mapped to more than one host, saving memory used to store duplicate code or data. This allows commonly seen pages, like the zero-page (a zero filled region of memory) to be duplicated very efficiently.
Additionally, ESX randomly samples a small number of guest pages over a given time period to determine how many of them are idle. It then charges an "idle tax", increasing the proportion of pages that are reclaimed for use by other more active guests.
Finally, ESX tracks pages in "high" memory (addresses above 4GB) that are involved in DMA operations that use a "bounce buffer" in "low" memory space and remaps them to reside directly in "low" memory space, reducing overhead.
4. Evaluation
With the exception of IO remapping, each technique described is benchmarked and has results shared and evaluated. In general, the implementation feels much more thorough (because they were creating a commercial product) and the benchmarks more real world than the papers we have read up until this point.
5. Confusion
It seems like the idle memory tax coefficient of 0.75 was kind of chosen by the seat of the pants. I would be interested to know what effects a higher tax rate can have.

Summary:
The paper describes various memory management techniques used by ESX server. A software layer is added for efficient hardware utilization. Since servers never use compelte memory allocated to them, each guest OS is overcommitted memory. Various techniques such as ballooning, idle memory taxing, content based paging, IO-remapping have been introduced for memory management.

Problem:
Server virtualization helps is efficient utilization of physical machines with little or no performance penalty and cost benefits. It poses a challenge to run commodity operating system without modifying it. Earlier solutions such as Disco required minor changes to OS running on the top of virtualization layer. The paper presents ESX server developed at VMware as solution.
VMware proposed to add a software layer between the multiple commodity OSes and hardware to achieve server virtualization and consolidation. Additionally, it tackled the problem of wasting resources through memory overcommit.

Contributions:
1. Overcommitting and multipleixng to facilitate higher degree of server consolidation, lesser wastage of resources.
2. Ballooning: ESX grabs memory pages from a guest OS by increasing page demand in its balloon module running inside the guest OS.
3. Content based page sharing: Pages are shared based on content, identical content is identified through hashing. Shared pages are marked as copy-on-write while all other potential pages are marked with hint entry. Any changes to shared page leads to creation of private copy for the writer.
4. Idle memory taxing for perfomance boost and efficient memory utilization.
5. I/O page remapping into lower memory to reduce number of pages copied in IO-operations.

Evaluation:
The paper provides an extensive evaluation for various memory management techniques user for efficent server virtualization and resource management. The authors provide comparison of their techniques with Disco at appropriate points. Ballooning technique incurs an overhead of 1.4%-4.4% which is attributed to size based guest OS data structures. For large number of VMs sharing contributes 67%, reclaiming 60% of all VM memory. Idle memory taxing improves the overall throughput by 30% in case of 2VMs.

Confusions:
An interesting read with substantial performance evaluations to support the concepts introduced. I am confused about selection of particular guest OS for balloning? What policies are used? How is fairness guaranteed? For page sharing, I am curious to know the details/implementation of the metrics used for performance evaluations?

Summary:
The paper introduces the policies and mechanisms adopted in the VMWare ESX server for memory management. The ESX server was meant to support virtual machines, with unmodified Operating Systems, running workloads that over-commit memory. The author mentions the techniques implemented, rationales behind them and evaluations adopted to test the policies.

Problems:
The designers wanted to be enable the users to over-commit resources such as the processor and the memory, yet guaranteeing fair isolation. They were unable to modify the commodity OS to suit their needs. Meta-level page replacement policies could not be adopted due to inefficiency and other reasons, like double paging. Since guest OS could not be modified, the page sharing had to be done based on contents, which was very expensive. The memory had to be optimally shared in-order to utilize the resources efficiently. Finally, the memory allocation had to be dynamic to adapt to conditions like addition of a VM etc.

Contributions:
I thought that the concept of ballooning, to enable the the guest OS to decide which pages to yield to the ESX server, was very novel and clever. This enabled the server to use the guest OS's page replacement policies thereby having to provide only a mechanism. A backup page replacement policy was also provided to serve during the boot time, when ballooning wouldn't be running yet. The use of hashing to identify the pages to share, based on their content, and scanning randomly to find such pages seem rather inefficient to me, but the evaluations show that the performance was satisfactory. The evaluation presented for this case is SPEC95 based, I wonder how this mechanism will work with real world workloads! The concept of share based memory allocation seems very appropriate for an ESX like system, where the users tend to over estimate resources causing under utilization of the resources or impacting performance of other VMs. The ESX server allocates a minimum number of pages to each VM and the rest of the pages are allocated based on number of shares contending for a resource. The designer adopted a statistical sampling approach to measure idle memory, which enabled the system to respond quickly to increased memory requirement and slowly to decreased, as desired. The I/O page re-mapping was done based on a threshold, number of times an page mapped to an I/O was copied.

Evaluation:
The author has provided evaluations for each of the techniques or mechanisms implemented. For example, dbench benchmark for ballooning, spec for page-sharing etc. The author also provides the overheads for each of the mechanisms, both processor(time) and memory. Also, in most of the cases, appropriate workloads have been to chosen to evaluate the mechanism being implemented, ex. toucher, which allocates and repeatedly accesses a fixed amount of memory was appropriate to show the effectiveness of the memory sampling technique. All the evaluations are backed by corresponding graphs.

Confusions:
The ballooning technique pins the VM's pages while inflation, when ESX server request memory through the driver, but the author mentions that during memory scarcity the guest OS pages the 'pages' it wants to reclaim to it's virtual disk. I couldn't make sense of this statement.

1. Summary
The paper describes the concept of the ESX server OS that is a light-weight VMM that enables the page allocation/swapping mechanisms of the Guest OS itself. Thus the VMM has no major part in swapping pages out in the physical memory yet provides page sharing and I/O remapping options.

2. Problem
Before the ESX, the VMM used to swap pages out of physical memory randomly from a VM. Worst case, this could lead to the two following possibilities:
a) The VMM can swap out an active page of the guest OS.
b) Double paging: The VMM can swap out a page that the guest OS was about to swap out. This would mean that when the guest OS becomes active again, it would bring the page in and swap it out again.
The cause of this issue is because the VMM has no information about the pages in a guest OS. Instead of paravirtualizing the OS, the author chooses to resolve this issue by letting the guest OS do its native version of page replacement thus reducing the workload of the VMM and also not swapping out active pages of the guest OS.

3. Contributions
a) Implementation of a balloon that acts as a driver in all the guest OSes inserted during initialization. When guest OS-A is under memory pressure, the VMM inflates the balloon in guest OS-B thereby causing it to have memory pressure and swapping pages out. The swapped out pages( pages given to the balloon ) are pinned and their addresses in physical memory given to guest OS-A thus relieving it of its memory pressure.
b) Since there in no pmap structure in the VMM, it has no idea of the pages that the different guest OSes use and hence it has to provide a mechanism to share pages among the OSes. ESX uses content based sharing - pages are hashed based on their contents and this ensures that only one copy of one content is preserves at one physical address. This one copy is make copy on write.
c) In cases of high memory pressure where the balloon cannot inflate more, the ESX has to resolve to the last case resort of demand paging where it swaps out pages randomly from any OS. An optimization is that, instead of randomly paging out from any OS, the OSes are given shares based on which they are selected to swap out their pages. This ensures a level of fairness in swapping out pages. Another method weighs the guest OSes based on the idle pages it has by statistical sampling.
d) Another optimization made in ESX is that network addresses (32-bit) are mapped on directly to lower level machine machine addresses(36-bit) thus avoiding another level of indirection from the I/O MMU device driver.

4. Evaluation
The authors show various results to show that the throughput and performance of the VMs with and without the VMM are comparable. One of the most important results is that with content based sharing, over-committing the resources does not become an issue and thus leading to efficient usage while utilizing the same amount of memory for the has data structure.

5. Confusions
Since the balloon inflates and the guest OS needs to become active to swap out its pages, isn’t there an extra context switch that makes it slower when compared to demand paging?

Summary:

The paper describes certain mechanisms which help to manage memory pressure in case of a machine which has many different VMs scheduled to run on it. Mechanisms such as ballooning to reclaim a page by inducing memory pressure on a VM, swapping idle memory, sharing pages among VMs based on comparison of their contents and I/O remapping of hot pages are introduced to help manage the memory pressure.

Problem:

When multiple VMs are scheduled on a machine it is very difficult to identify which pages can be swapped out. One might swap out a page which might be required by another guest OS running on another VM. A few mechanisms are available but they require the modification of the guest OS. The paper solves problem of memory management without having to modify the guest OS.

Contributions:

The paper introduces the following novel ideas:

1. Ballooning technique - First, an attempt was made by letting the VMM decide which page of VM needs to be swapped out under memory pressure. But, this lead to a problem called double paging where the guest OS later might try to swap the same page resulting in page fault. Hence, the ballooning technique was introduced in which a pseudo-device drive was loaded into the guest OS and was made to inflate i.e., increase memory pressure on guest OS causing it to swap out the page whenever there was a need to reclaim memory.

2. Content Based Sharing - A hash table is maintained which hashes values based on the contents of the page, and when another VM references a page, the hash table is checked and if similar page is found, they are shared. Certain mechanisms are implemented to prevent false matching.

3. Reclaiming Idle memory - First each VM was given a share of memory and when there was a memory requirement, the VM with least share had to give up its memory. But, in certain cases this was unfair as idle pages of VM with high memory remained in, while active pages of VM with low share were swapped out. Hence, tax was introduced on idle pages and one with least share after applying tax was swapped out.

Evaluation:

The paper clearly gives solution of the problems faced by VMM in memory management. Fairness has been taken into consideration and so is the priority. Also, the experimental results strengthen the case of using these mechanisms for memory management.

Confusions:

Doesn’t content based sharing incur an overhead in spite of maintaining a hash value? Are there other mechanisms which would achieve the identification of pages being shared?

Feb 24th
Memory Resource Management in VMware ESX Server

1. summary
This paper introduced four major memory resource management techniques used in VMware ESX Server:


  1. Ballooning used for page reclaim.

  2. Idle memory tax used for identifying idle pages and reclaim them when memory is in short.

  3. Content-based page sharing mechanism that improves memory utilization without introducing too much overhead.

  4. Hot I/O page remapping that addresses efficiency issue for high memory I/O operations.


The paper explained how these four techniques work to achieve better memory management, and proved their effectiveness with quantitative experiments and analysis.


2. Problem
Resource management is particularly difficult and interesting in virtual machines such as Vmware products. The paper mainly addressed the following issues:


  1. Page replacement becomes harder with virtual machines, because only the guest operating system knows which pages are least valuable and should be swapped out when machine memory is in short. The meta-level page replacement policy should be in accordance with guest OS’s policy. Otherwise the performance would be bad, and it may even result in a double paging problem

  2. It is highly possible to have several VMs running instances of the same guest OS, or even the same applications. Having a different copy in memory for each of them seems a waste of memory resources.

  3. Memory sharing across virtual machines is difficult, because it is desired to fully utilize the idle resources as well as to provide some sort of guarantees for prioritized users. Shared-based allocation is not appropriate, as it does not encourage active memory usage and may leave large amount of memory idle in some clients.

  4. I/O transfer introduces limitations for memory addressing. In virtual machines, the problem outstands because physical pages are sometimes mapped to machine pages in high memory. It is not efficient to copy the data to a bounce buffer in low memory everytime when it’s in use.


3. Contributions
This paper introduces mechanisms designed to address the above problems in VMware ESX Server. The mechanisms include:

  1. Ballooning. Use a small balloon module in each guest OS to produce memory pressure in the guest OS. It helps guest OSs to swap out their least valuable pages when the overall memory is not sufficient.

  2. Content-based page sharing. A hash value is stored for each page as a lookup key in the system, allowing OS-independent page sharing simply by the content. Two data structures - shared frame and hint frame - are designed in implementation.

  3. Idle memory tax. This idea is to adjust the shares-per-page ratio explicitly with a tax rate value, so that idle memory in the system are more likely to be reclaimed. In ESX server it is adopted to achieve better memory allocation that takes performance isolation into account.

  4. Hot I/O page remapping. This mechanism is used to boost memory remapping efficiency by optimize the location of pages according to usage. A hot page is a page in frequent use, and should be copied to lower memory address.

4. Evaluation
Experiments are conducted to prove that content-based page sharing is effective with little overhead. Another set of experiments show that reclaiming idle memory is very beneficial to system performance.

5. Confusion


  1. I didn’t quite understand section 7 I/O page remapping. Does this issue come with the use of DMA for communication between guest OS and host OS?

  2. In section 3.3, what does it mean by “the ESX Server swap daemon receives information about target swap levels for each VM from a higher-level policy module”?


1. Summary
The current paper discusses memory management mechanisms and policies for a fully virtualized ESX Server software. This includes techniques like balloon drivers, idle memory taxing, content-based page sharing and hot I/O page remapping.
2. Problem
The paper primarily addresses the problem of efficient memory resizing and reclamation for a fully virtualized system with over committed memory, without requiring any modifications to the OS software. This is a problem because the VM has to make decisions based on insufficient information and the guest OS reclamation policies are often in contention with each other. In addition, to support over commitment of memory, page sharing among the VMs should be enabled. Achieving efficient resource utilization with performance isolation is the challenge here.
3. Contributions
To support dynamic resizing of the memory available to a guest OS, a balloon driver is implemented in the guest OS. On direction from the server, the driver varies its allocation rates and triggers the native reclamation policies thus freeing up pages to be reclaimed by the server.
A content based sharing detection is implemented. This uses hashing to compare page contents of COW pages. To avoid the overhead of COW, a special hint is used to tag the pages for comparison.
The min-funding revocation algorithm is extended to use an idle-taxed, shares per page ratio. The VM with the least shares per page is selected for reclamation. This technique prevents idling VMs with many shares from hoarding memory while active VMs with fewer shares suffer under memory pressure. A statistical sampling approach is used to track the idle pages in a VM.
4. Evaluation
The experiments are conducted on a Pentium III single and multi core machine running Linux and windows operating systems. The performance of individual policies and the global dynamic allocation policies is measured. One of the interesting results was the shared content among the VMs, which the authors found to be around 67% for a large number of VMs. I think the higher level dynamic allocation policies also show how the various policies coordinate with each other as the applications progress. Overall, the experimental results highlighted the strengths of the policies, including the overheads.
5. Confusion
I could not understand how ballooning solves the double paging problem.

Summary :

The paper talks about the various mechanisms used by the hypervisor VMWare ESX server for memory virtualization and reclamation when running with multiple virtual machines running different guest operating systems. Specifically, it highlights the techniques of ballooning, transparent page sharing, proportional sharing, I/O page remapping and evaluates these techniques.

Problem :

The problem in question is to design techniques for automatic allocation and reclamation of memory from virtual machines by the virtual machine monitor (VMM) in a pure virtualized environment without modifying the guest OSes.

Contributions :

1. It follows the policy of memory overcommitment (the total amount of guest physical memory for all VMs is greater than the amount of host memory available) and automatically manages the allocation to VMs based on configuration parameters and system load.

2. To reclaim memory from the virtual machines that are idle, the hypervisor uses the techniques of ballooning and content based page sharing.

3. Ballooning works by installing a balloon device driver on the guest OS which is inflated by the hypervisor when memory is to be reclaimed. As hypervisor is unaware of which pages could be reclaimed, it delegates it to the guest OS through ballooning that pins the pages that could be reclaimed from being used. Passing the physical page numbers to the hypervisor helps it reclaim those pages.

4. Makes effective use of the fact that many pages are shared between the VM’s and therefore, the redundant copies can be reclaimed. To achieve it through pure virtualization, it uses content based sharing - hashing to identify pages that are redundant thereby reclaiming them.

5. Uses a proportional share algorithm for allocation of memory. However, to tackle the problem of idle VMs with high share, it employs the idle memory tax such that an idle client is charged more for an idle page and thus, adjusts the share.

6. Uses the four states - high, soft, hard and low in order to decide when to reclaim memory. In the high state, it does not require reclamation, in the soft state it uses ballooning, in the hard state, it uses swapping and in the low state, it does swapping and also blocks VM’s which have exceeded certain threshold.

7. Also uses hot page remapping from high to low memory so as to minimize the overhead of copying pages between high and low memory.

Evaluation :

Comprehensive evaluation has been done to prove the effectiveness of the techniques. The evaluation of page sharing indicates a significant sharing level of 67% with increase in the number of VMs. The idle tax mechanism has been evaluated with a tax rate of 75% with an idle VM and a fully loaded VM and an increase in performance by 30% has been noted. Similarly, workloads have also significantly benefited from ballooning.

Confusion :

I didn’t quite get the technique of estimation of memory usage by weighted moving averages.

Summary:
The authors present a thin hypervisor implementation named ESX. They focus specifically on the memory management features of ESX, which allow guest operating systems to overcommit memory on a physical host. Additionally they describe how ESX allows the guest operating system to manage memory more directly.

Problem:
Because operating systems work on the assumption that the memory they are allocated is fixed, it is difficult to run many virtual machines on a single physical host without over committing memory. To allow for this, hypervisors typically add an additional level of paging, allowing them to swap virtual machine “physical” memory to disk themselves. However, this introduces several problems when the guest operating system and hypervisor have their own separate paging policies, sometimes causing unnecessary page faults.

Contribution:
The authors describe a ballooning technique where a memory driver is used on the guest operating system to allow the guest OS to dictate the paging policy, rather than forcing the hypervisor and guest OS to both run, possibly conflicting, memory management techniques. When memory is scarce, the hypervisor forces the guest operating system, through the memory driver, to pin some pages to its “physical” memory. This allows the hypervisor to reclaim those pages itself and allow the guest OS to deal with the memory scarcity on its own. If memory becomes more free in the hypervisor, it can deflate the guest OS’s memory balloon by allowing it to unpin some of its pages. They also implement a page sharing technique that matches identical pages by hashing them and comparing if the hashes match, allowing reduced machine memory usage. Finally they explain an idle memory tax technique where idle memory is detected by randomly sampling small numbers of pages within each guest OS to detect access, and marking them as idle if they are not. A tax parameter then defines how much of a guest OS’s idle memory may be reclaimed for other use at any given time.

Evaluation:
The paper contains brief evaluations of the memory sharing technique and the memory reclamation or tax technique. These evaluations mostly give a visual representation of how the techniques work, rather than comparing the supposed improvements over previous systems. Something like seeing a reduction in page faults when using ESX vs. another hypervisor could have been included.

Confusion:
I’m not entirely sure why the tax ratio defaults to .75 and not simply 1. It seems fair to allow reclamation of all idle pages from a VM. I don’t quite understand their brief note about why a tax ratio of 1 is not a good idea.

1. Summary
In the paper "Memory Resource Management in VMware ESX Server", the authors propose the VMware ESX server, a thin software layer which is native i.e. VMM has full privilege to system hardware, with no OS underneath. They discuss the challenges of memory management in virtual machines and propose three major policies: content based page sharing to eliminate redundant copies of pages, ballooning to reclaim pages from a VM and idle memory tax to utilize memory efficiently without affecting performance isolation.

2. Problem
- how to improve the performance of VM systems whose primary memory is overcommitted
- which guest OS to choose from revoking memory, which of its particular pages to reclaim
- how to avoid saving many redundant copies of the same page e.g running multiple copies of the same OS
- how to overcome strict partitioning of memory between OS's

3. Contributions
Ballooning:
- reclaim the least valuable page from a guest OS without explicit knowledge about the less importance pages to the OS
- load a balloon module as a pseudo device driver in each guest OS which interacts with the ESX server
- balloon driver communicates the physical page number of each page the guest OS memory manager wants to reclaim and the ESX server reclaims its corresponding machine page

Content based page sharing:
- identifying shared page copies by using a hash value that summarizes the page contents
- implemented as a single global hash table containing frames for all scanned pages (scanning done randomly)
- using standard copy on write technique to share pages in read only mode (similar to Disco)

Share based allocation using idle memory tax:
- incorporates information about active memory usage of a OS to the min funding revocation algorithm
- idle memory tax specifies the maximum fraction of idle pages that can be reclaimed from a client
- idleness is measured by sampling: n pages are sampled each second, forcibly cause faults; next access re-establishes the mappings; the fraction f of sampled pages that fault gives the memory actively used

4. Evaluation
The authors have carried out a detailed evaluation of their policies on varying guest operating systems and hardware. File server benchmarking shows ballooned VM performs almost closely as non ballooned VM with an acceptable overhead. Using context based page sharing as the number of VM's increases, sharing of memory reaches 67% reclaiming approximately two thirds of the machine memory. The authors have demonstrated page sharing metrics not only for best case workload of running same guest VM's but also for workloads running VM's for different servers. For VM's running windows and Linux with identical share allocation, they prove how changing tax rate reclaims memory from the idle windows VM.

5. Confusions
I don't understand the intuition behind this policy: "Always try sharing a page before paging it out to disk" and how useful it is today. Also, I don't understand what is I/O page remapping and why we need it.

1. Summary
This paper describes the virtual memory management techniques used in the VMware ESX server. These techniques include ballooning, taxing VMs which keep pages idle, content based memory sharing, dynamic reallocation and page remapping.

2. Problem
VMware ESX server enables full virtualization on native hardware. It has to manage memory efficiently to support number of virtual machines running simultaneously. It has to do so without having the intimate knowledge of how the memory is being used by the guest OS.

3. Contributions
This paper makes a number of contributions that enable the ESX server to manage virtual memory effectively. The first problem addressed is that of page replacement. The author proposes the ballooning technique to solve this problem. This involves adding a balloon driver module to each VM that can allocate or deallocate pages on getting signaled by the ESX server. The pages that are shared among VMs follow the copy on write protocol. To detect which pages can be shared the ESX server employs a content based scanning technique that goes through the entire memory and enables sharing for identical pages. The scan happens randomly and a hash table is used to search for identical pages efficiently. Each VM gets memory based on the shares that it has where shares are equivalent to tickets in a lottery scheduler. The ESX server levies a tax on VMs that keep pages idle from its allocated set. These pages are then reclaimed whenever memory becomes scarce. The ESX server also provides support for page remapping between high and low addresses.

4. Evaluation
The author does a good job in evaluating each of the techniques he has proposed. The evaluation is mainly focused on demonstrating high memory throughput.

5. Confusion
The paper does not mention anything about how the shares of each VM are allocated. Also, is there any use of the hint frame generated in the scanning algorithm other than avoiding the need to compute the hash for subsequent times?

Summary:
The paper introduces the VMware ESX Server, which is a thin software layer designed to multiplex hardware resources efficiently among virtual machines running unmodified operating systems.

Problem:
Some virtual machine monitors use pre-existing os for hardware support, leading to lower performance and incomplete control over resource management. It's also challenging to run existing os without modification.

Contributions:
(1) ESX Server virtualizes physical memory by adding an address translation level. It uses pmap to do PPN-to-MPN mapping.
(2) The paper proposes the ballooning technique to reclaim pages. The inflation and deflation of the balloon module controls the memory pressure of the guest os and can cause page reclaim and page free.
(3) It also uses content-based page sharing. Page contents are summarized by a hash value and copy-on-write strategy is used.
(4) It proposes share-based allocation and idle memory tax. The allocation is based on the proportion of VMs and also active memory usage. Statistical sampling method is used to estimate idle memory.
(5) It proposes I/O page remapping to reduce redundancy and copy overhead.

Evaluation:
The paper has solid evaluation for every mechanisms. The evaluation covers benchmark for ballooning, page sharing implementation, quantitative memory sampling and idle memory taxation, dynamic allocation.

Confusion:
Why is the ballooning strategy special? It seems like a module that helps the server to reclaim/free the memory of guest os. Why it should have "inflation"/"deflation"? The paper says inflating the balloon increases memory pressure. What's the detail of this mechanism?

Summary:
Paper discusses design of software layer for multiplexing hardware resources to efficiently support VM workloads that over commit memory. Server mechanisms and memory managing techniques like ballooning, idle memory taxing, content-based page sharing and hot I/O page remapping are proposed in this paper to achieve the design goal.

Problem:
The problem is how to flexibly over commit memory, processor, and other system resources while providing resource guarantees to VMs of varying importance.

Solution:
Without any modification of the guest operating systems running in virtual machines, ESX server tries to provide high I/O performance and complete control over resource management through the implementation of following policies:
- min, max and memory shares size are maintained for each VM to calculate target memory allocations.
- Ballooning : A lower-level mechanism to reclaim memory from virtual machine, by paging-in/out the pages provided to the VM.
- Content-based page sharing : share identical pages between VMs. These pages are efficiently identified through the use of hash mapped values generated based on the content of each page.
- idle memory taxing : A VM claiming a page and not using it will be taxed more then the cost of actively used pages. This is to avoid locking on crucial page resources, by higher priority VMs, and thus unnecessarily starving the lower priority VMs despite the availability of memory resources.
- hot I/O page remapping : when a threshold for a network I/O page present in high address space is crossed, the system transfers that page to lower address page which is within the network device addressable range.
- zero-based physical address space mapping of pages to VM to create an illusion of direct machine memory access.
- Maintaining shadow page tables that contain virtual-to-machine page mappings to avoid overhead as hardware TLB caches the the direct mappings.

Evaluation:
The paper has done evaluation to study the impact of idle memory taxing in terms of improving the overall performance improvement for more-intensive memory VM by reclaiming unused resources. Dynamic memory reallocation computation study is also done on different workloads to see successful variation in balloon memory, and active memory allocations.

Learning/Confusions:
I would like to know more about how inflating the balloon increase the memory pressure in guest OS, and causes OS to invoke native memory management policy.

Summary: This paper introduces the memory management strategy of VMware ESX server, including ballooning technique to dynamically assign memory to different VMs, idle memory tax to achieves efficient memory utilization, content-based page sharing to reduce redundancy. For each technique, there are experiments with quantitavie evaluate results.

Problem: The trends such as server consolidation and the proliferation of inexpensive shared-memory multiprocessors make the popularity of server virtualization. In server virtualization, multiple virtual machines (VMs) are running as isolated on the same physical server. To make the server running effciently, a dynamic, efficient memory management strategy is necessary.

Contribution:
1. Ballooning technique to do dynamic memory allocation. A ballooning module is put into each guest OS's memory, it may ask for memory (inflate) so that the guest OS's memory decreases, or may release memory (deflate) so that the guest OS's memory increases.

2. Content-based memory sharing to reduce the redundancy of memory and overhead of copy. There is a hash table to detect the same page, once the page detect as the same, different VMs physical page address are mapping to the same machine page address. This page is marked as copy-on-write: once modification is done by one VM, the private copy for this VM is created.

3. Concept of Shares and idle memory tax to make the mamory allocation more dynamically and efficiently. The memory are assigned proportional to VMs' shares. One VM with large idle memory will has less Shares (charged tax).

Evaluation: For each technique, there is quantitative evaluation.
1. In evaluation of ballooning technique, VMs at 256 MB size with ballooning has very similar throughput with VMs at 128 MB size without ballooning (larger space, similar throughput).
2. Content-based memoery sharing can reduce the total memory by 10%-60% based on different workloads.
3. In shares and idle memory tax technique, when there are 2 VMs, It can boost the performance by 30%.

Confusion: How does the ballooning idea work in detail? When is the ballooning module initialized? Is it a system abstraction that cannot be disabled?

Summary:
This paper describes the various policies of memory management implemented in the VMM ESX Server and how they improve performance by efficiently overcommitting memory between VMs. Page reclamation, efficient memory utilization and improved page sharing are implemented by new mechanisms such as ballooning, idle memory tax, and content based sharing.

Problem:
The goal of the ESX Server as other VMMs before it was to multiplex hardware resources, so they could be better utilized among multiple VMs. The ESX Server was particularly focused on server consolidation by overcommitting memory among multiple servers to get the most performance but still having enough control over the memory management to provide some minimum guarantees to all the servers. The problem unique to ESX was that it intended to do the above without any modifications to conventional OSs.

Contributions:
The ballooning technique to reclaim pages from the guest OS was better than previous complicated hypervisor level policies which suffered from performance anomalies. Ballooning’s success is from the fact that it lets the OS use its own memory management routines, which simplifies its behaviour, and is adaptive to the pressure on the guest OS’s memory requirements. Content based sharing allows ESX to go beyond the earlier attempts of Disco with increased sharing opportunities without the need to modify or understand the guest OS code. ESX also managed to get the best of a shared based allocation scheme while maintaining efficient utilization by taxing or penalizing VMs with idle processes that were hoarding pages without actively using them. Statistical sampling allowed for an accurate idle memory measurement needed to impose this tax while keeping the overhead of page faults low. ESX uses the transparent page remapping mechanism to remap frequently used IO pages down to the lower 4GB of memory which removes the overhead of using an extra layer of indirection to enable IO devices to access the upper half of memory.

Evaluation:
The authors conducted sharing experiments for each the above mechanisms to evaluate their overheads and performance. A file server benchmark for ballooned and non ballooned VMs showed that the ballooned VM only suffered a maximum of 4.4% overhead for memory sizes ranging from 128 to 256MB. Content based sharing among identical VMs approached 67% of the total memory of all the VMs with even some amount of sharing for a single VM because of zero pages. For an idle Windows VM and a Linux VM performing memory intensive workloads, the effect of the idle tax lead to better sharing which boosted the performance of the Linux VM by over 30%.

Confusions:
I understood how the IO page remapping was handled but i am curious as to why that is a problem in the first place.

Summary: This paper introduces VMWare's solutions to several problems related to the memory management of virtual machines. Ballooning is a technique that makes guest OS reclaim pages when the host is short of memory. Content-based sharing saves memory by sharing same pages across virtual machines. Idle taxes are applied to make efficient use of available resources. None of these techniques requires modifications to guest OSes.

Problem:
1. When the host machine is out of memory because of overcommitments made to guest OSes, host machine needs to decide what pages to reclaim. However the information given to the host OS is so limited that it cannot make a good decision. Moreover, since the guest OS may also decide to reclaim the same page the host OS does to, one page may be swapped in and out for twice.

2. Certain policies may guarantee some guest OS more memory than others. However if a more prioritized guest OS underutilizes its memory when the host is short of memory, we'd better reclaim those idle pages owned by that OS for the best interest of all virtual machines. Hence we need a way to identify idle pages, and a mechanism that will reclaim idle pages when memory is in short.

3. It is desirable to make a page of the same content to be shared among multiple virtual machines. The first challenge is that we will need a way to efficiently identify duplicate pages. The second challenge is that we need to do this without modifying guest OS.

Contribution:
1. Ballooning is a technique to solve problem (1) above. A driver (a balloon) is installed in the guest OS. When the host needs to reclaim memory, the driver will allocate a block of memory (inflate). If the guest OS is also short of memory after the allocation, it will do the right thing itself: reclaim the least used page, swap it to secondary storage, etc. Then the host will make use of the memory allocated by the driver.

2. Idle tax is charged to idle pages. The host will measure the idleness of pages by sampling memory accesses made by the guest OS. If a page is found idle, it will more likely to be reclaimed.

3. Content-based page sharing is made possible by computing the hashing of each page. If multiple pages has the same hashing, a complete comparison is then perform to make sure that they are indeed identical. Identical pages will share the same machine memory block, and will be marked as Copy-on-Write. An optimization is done so that when no matching page is found for some page, this page is marked as hint. Writes to hint pages will not require recomputing the hash of the page; however this is at the cost of possibly missing potential matches because of its stale hash value is used for future matching.

Evaluation: The author evaluated every part of the design with experiments. Experiment results show that these techniques are all effective.

Confusion: I'm confused about the I/O page remapping section. What is it for? How does it work?

1. Summary

The paper details several novel techniques used in VMWare's ESX server to manage memory resources among virtual machines running
commodity opeating systems.

2. Problem

Many small underutilized servers can be consolidated on a single physical server to reduce costs. In such cases, instead of static partitioning, it should be possible to over-commit system resources (such as memory) in order to reap maximum benefits from the multiplexing. Furthermore, this has to be achieved without modifying the commodity operating systems (as VMware cant't control this).

A standard early approach was to introduce another level of paging and reclaim memory from VMs. However, this is not an optimal solution as the VMM may not have the full information on which pages are "hot". Hence, there is a need to explore other approaches to manage memory across VMs.

3. Contributions

The paper introduces 4 new techniques to manage memory across VMs.

1) Ballooning

A Balloon module is loaded into the guest OS as a pseudo device driver or kernel service. When there is a need to reclaim memory from the VM, the balloon "inflates" by allocating pinned physical pages in the VM. This causes low memory in the VM to which the guest OS responds to by invoking its own mechanisms to free memory. Similarly, in case of plentiful memory, the balloon is deflated by de-allocating the pages.

2) Content-based page sharing

Page sharing is an effective mechansim for eliminating duplciate copies of pages across VMs. ESX server introduces content-based page sharing where pages with identical contents are identified (through a hashing mechanism) and marked as copy-on-write to enable page sharing. Though more spohisticated polices may be used to determine when to scan pages for sharing, it is noted that a simple random policy is sufficient.

3) Idle memory tax

In pure-proportional share algorithms, we may run into a scenario where idle clients with many shares may hoard resources. Thus, ESX server introduces a idle memory tax whereby a client which has more idle pages is charged a higher tax and in cases of low memory, is a prime candidate to reclaim memory from. To know which pages in a VM are idle, a small number of pages in each VM are sampled at some predefined intervals.

4) I/O page remapping

ESX server keeps track of hot pages in high memory and remaps them into low memory periodically. This reduces the number of pages copied in I/O operations.

4. Evaluation

The performance of a VM is shown to be not affected by much even when it's memory size is ballooned down. However, the limitations of ballooning is noted as there may be cases when the balloon driver is disabled, uninstalled or unavailable. Page sharing is shown to reclaim around 60% of memory in the best case scenario of identically configured VMs and around 7%-33% in real world workloads while the overhead of page sharing is also minimal. Also, the implementation of the idle memory tax is shown to improve throughput by around 30% in a system with 2 VMs.

5. Confusion

I do not understand the problem of I/O page remapping very well. It would be helpful if this can be covered in class. Also, in the ballooning technique, how does the vmm know which pages the guest OS has swapped out to its virtual disk so that it can do the same?

Summary

VMWare ESX Server is a type-1 hypervisor that builds on the ideas from Disco, implementing a thin software layer to multiplex hardware resources between multiple virtual machines. This paper introduces some of the novel mechanisms used in ESX Server; notably, a ballooning technique to optimize page reclamation, a content-based page sharing mechanism which efficiently reduces memory footprint, and a memory tax to encourage proportional sharing of resources.

Problem

Individual physical servers are generally under-utilized. In the late 1990s enterprises would typically buy separate machines for their print server, file server, email server, etc. which rarely ran at full capacity but still needed sufficient resources to handle peak loads. This created a significant opportunity for consolidation of systems and sharing of compute resources to simplify management and reduce costs.

Contributions

ESX Server builds on the concepts from Disco, introducing several novel contributions which lead to significant performance benefits.

First, it adds a "balloon" module into guest operating systems, which it is able to expand through a private channel when it needs to reclaim memory. This technique triggers the native page reclamation policies in the guest OS and helps avoid the double-paging problem. Similarly, ESX can deflate the balloon to call pages back into memory when resources become more available.

Next, they add a page sharing technique which identifies pages by their contents. Pages are saved to a hash tree for fast comparison, unshared pages are marked as a hint entry, then a copy-on-write technique is used to share them.

Another important contribution is the concept of an idle memory tax, which makes page reclamations happen on the systems which are using them the least. This encourages proportional sharing of resources between all systems while maintaining high efficiency. ESX also maintains an admission control policy and does not allow virtual machines to power on without sufficient memory.

Evaluation

All these new mechanisms are throughly evaluated and presented. Balloon is highly efficient, with a maximum 4.4% overhead observed at a 50% reduction of system memory. Page sharing reclaims >60% of memory for large numbers of VMs. Idle memory tax provides a performance boost of 30% in the example given. Then are all significant and impressive results. It's also worth noting that VMWare ESX is still the market leader in x86 virtualization over a decade after this paper was published -- a good external evaluation of the quality of these ideas.

Confusions

I'm struggling to understand why the hint entry is needed for page sharing. Why can't they use the same frame for everything, and only copy-on-write when refs > 0?

What "private channel" does the VMM use to communicate with the balloon?

Summary
This paper focuses on the various techniques for managing memory for the various VM’s running on the ESX. They have introduced methodologies which are inspired from the DISCO paper but have significantly improved over them with the aim that the guest OS should undergo minimum changes. They have also introduced new features for efficient memory utilization, idle memory identification and reclaiming overcommitted memory.

Problem
I feel that the VMware ESX Server was developed to improve over the techniques presented by DISCO for VMMs to improve performance and better utilization of memory. DISCO needed major modifications in the guest OS to enable page sharing and elimination of redundant pages which ESX tries to avoid. Share allocation also does not take into consideration active memory or working sets which was solved by implementing idle memory tax.

Contribution
The authors introduce ballooning technique for reclaiming pages in which a pseudo device driver or kernel service is installed in the guest OS, which is instructed by the ESX server to start allocating pages in the VM. This balloon driver informs about the pages allocated to it to the ESX server, which in turn reclaims the related machine pages. This was achieved with very less modification to the guest OS code, but there is still a chance that this technique will not work if the Guest OS uninstalls the balloon driver.
ESX server also introduces the concept that if the contents of the pages are identical, they can be shared. They have implemented the comparison of pages using hash. All clients are allowed to consume memory assigned to them by its share allocation — which is dependent on the number of clients vying for the particular resource. This policy was improved by adding the feature of idle memory tax which points to clients having idle pages and they become prime candidates for page reclaiming. Statistical sampling methods are used to determine the amount of active memory in every VM. VMs are also provided with minimum and maximum limits of memory allocation and also its memory share. VMs are allowed to power on only if there is enough unreserved memory and server swap space. ESX also maintains high, soft, hard and low marks for available memory. It enables reclaiming pages by ballooning, paging and stopping execution VMs if there is a memory shortage.

Evaluation
The authors have conducted experiments to evaluate the performance of reclaiming memory, where they claim that while running large number of VMs, sharing approaches 60% while memory reclaim was 60%. They also provided data that when the idle memory tax was raised to 75% they performance was boosted by 30%.

Confusions
I not clear on the understanding of the statistical sampling approach and the concept of IO remapping. Also unclear on the statement — we maintain separate exponentially weighted moving averages with different gain parameters.

Summary
The paper discusses several novel core mechanisms and policies used to manage memory resources in VMware ESX server, thin s/w layer designed to multiplex hardware resources efficiently among virtual m/c running unmodified commodity OSs. Techniques such as ballooning, idle memory tax, content based transparent page sharing and page remapping are coordinated by higher-level dynamic reallocation policy to efficiently support virtual machine workloads that overcommit memory.

Problem
Server consolidation is done for better administration and reduced cost but in order to allow efficient multiplexing of resources, system should be able to overcommit the resources like memory as commodity OS do not yet support dynamic changes to physical memory sizes.
ESX server tries to improve with novel techniques without modifying the guest Operating Systems.

Contributions
ESX server introduces novel memory management techniques that allow sharing without any changes in the guest OSs. ESX adds an additional level of address translation by maintaining a pmap data structures.
Ballooning - reclaims memory from a VM by implicitly causing the guest OS to invoke its own memory management routines.
Idle Memory Tax - technique to solve problem with share-based management of space-shared resources, enabling both performance isolation and efficient memory utilization.
Content-based transparent page sharing exploits sharing opportunities within and between VMs without any guest OS involvement.
Page remapping - to reduce I/O copying overheads in large memory systems.

Evaluation
The paper presents good amount of evaluation for each of the techniques and policies introduced in ESX server. The effectiveness of ballooning is demonstrated by evaluating similar performance of the synthetic dbench benchmark. Page sharing performance is evaluated and for large number of VMs, sharing approaches 67% and nearly 60% of all VM memory is reclaimed. Authors have also presented experiment results of idle memory tax and dynamic reallocation.

Confusions
I didn’t understand the I/O page remapping completely.

1. Summary
This paper presents novel techniques used in VMware ESX server. These techniques are specifically focused on resource management to achieve high performance while allowing for unmodified operating systems.

2. Problem
As computing power grew, machines became capable of virtualization techniques. Virtual machines provided isolation and the illusion of dedicated machines, but were not able to take full advantage of computing power. Virtual machine monitors at the time were not able to provide overcommitment and performance guarantees to VMs.
Most VMMs at the time also required modifications (however minor) to existing operating systems. This was undesirable, as OSs were continuously growing more complex.

3. Contributions
The major contribution of the paper was the overcommitment of memory, and necessary techniques to accomplish this successfully. In ESX Server, each virtual machine is allocated memory which sums to a total larger than the available machine memory of the system. This allows for statistical multiplexing, as VMs are not likely to use their entire resources. In the case of overcommitment, ESX Server must choose a page to swap out.

Because meta-policies (in which the ESX attempts to choose a page to swap out) operate on little information, the concept of ballooning is introduced. Each VM is configured with a specialized “balloon” driver which communicates periodically back to a server. As memory needs change, a balloon can be expanded or contracted by pinning/unpinning pages. This allows the OS to swap out pages with more knowledge than the ESX Server. In severe memory situations, demand paging and swapping to disk is used.

The paper also contributed the concept of content-based sharing. Here, pages are periodically scanned, and hashes are computed. Pages which hash to the same bucket are compared bit-by-bit. In case of a match, ESX Server transparently changes page mappings. Copy-on-write is also implemented.

Another major contribution is share-based allocation. In this method, VMs can be configured to allow for proportional shares of memory. To better utilize the device, idle memory is also incorporated. When choosing pages to swap out, a victim is chosen by incorporating shares, current active pages, and current inactive pages. To measure this idle memory , the server periodically invalidates a random sampling of pages, and measures which pages are referenced again.

4. Evaluation
The authors find that ballooning has only a small overhead ranging from 4.4% down to 1.4%. They also find tremendous success in page-sharing. As the number of VMs grows, the sharing level for benchmarks approaches 67%. On real-world systems, the sharing percentage is still near or above 10% due to code segments and zero pages. The authors also find that their idle memory system works very effectively when imposing taxes. Finally, an experiment shows that almost all time is spent in states where only ballooning (or no reclamation at all) is necessary.

5. Confusion
How exactly does the ballooning driver pin pages to memory? I imagine you cannot allow these pages to be swapped out to memory, but what are the mechanisms for this?

Summary
This paper discusses the innovative memory management techniques implemented in the VMware ESX server. Mechanisms such as ballooning, idle memory tax, content-based page sharing are used to enforce high-level policies to allow efficient VM operation under memory overcommitment by the hypervisor.

Problem
Server consolidation in conjunction with memory overcommitment helps improve the utilization of servers and thus reduce costs. However, this can potentially violate the resource guarantees and more importantly, QoS guarantees provided to individual servers. Thus, effective solutions are needed to preserve these QoS guarantees while effectively managing the overall system memory. This paper attempts to solve this problem while running unmodified guest operating systems.

Contributions
Ballooning is one of the major contributions of this paper, through which a pseudo-device driver is used to generate artificial memory pressure in a VM, thus letting the guest OS itself reclaim allocated space. Content-based page sharing is a nifty way of identifying identical pages by storing hashes of page contents in a hypervisor data structure. Idle memory tax is an interesting method of aligning the goals of performance isolation between virtual machines and efficient system-wide memory management. While these mechanisms are significant contributions in themselves, their implementations and the high-level policies based on them as described in this paper are also important as a starting point for stirring further discussion along these lines.

Evaluation
The paper provides very specific evaluation results for each of the mechanisms and policies introduced in the ESX server. The effectiveness of the ballooning technique is suitably demonstrated by showing similar performance of dbench workload under equal availabilities of real as well as artificially reduced memory. The benefits of page sharing are shown in the form of reclamation of 60% of allocated memory in a best-case virtualized setup, and from 7-33% in real-world environments. The performance of active memory sampling and the dynamic memory reallocation techniques is satisfactorily evaluated. However, the author should have compared the relative performance of workloads running in virtualized environments in the presence and absence of memory overcommitment. This is essential to understand the amount of performance degradation caused by the overcommitment, and verify that the QoS guarantees are actually serviced.

Confusions
I do not understand why the I/O remapping problem is exacerbated by virtualization.

Summary :
This paper introduces VMWare ESX, the software layer designed to effectively manage memory when a system is hosting multiple guest OSes. The ideas of ballooning and content based sharing are introduced.

Problem :
1. Generally, when multiple VMs are running on a single monitor, there is a tendency for the monitor to overcommit memory than is actually available in the system. So, when one of the guests actually need memory, the monitor claims pages from other VMs to provide this. And to do this, the monitor does not communicate with the guest as to what page to reclaim. This could result in double paging.
2. When multiple guests try to access the same page that is either code or read-only material, there is no need to map this as multiple pages, one in each VM’s memory. Instead, there is a chance of exploiting memory sharing which will improve memory usage.

Contributions :
1. Introducing the technique of ballooning where the ESX has its driver, the balloon, in each guest OS. When the monitor requires to reclaim pages, it inflates the balloon which will put the guest OS under memory pressure and will thus result in it swapping out pages.
2. Randomized page replacement policy is used for demand paging when ballooning is not possible.
3. Introducing content based sharing to map the same physical page onto to multiple VMs.
4. Implementing share based allocation, where each guest OS starts with a share of the memory for itself.
5. The concept of idle memory tax, where a guest OS client is charged for the pages that are in his possession but are not currently active. This way, when another OS needs memory, it is reclaimed from the one which has the highest number of idle pages.
6. Assigning parameters as part of each guest OS like minimum, maximum memory, and memory share. This system also implements admission control based on memory and VMM swap space.
7. Designing different threshold levels for memory pressures and implementing different techniques for page reclamation at each level.

Evaluations :
The paper presents evaluations for both the ideas implemented. The dbench benchmark is used to evaluate ballooning. There is a significant overhead due to the guest OS data structures because they are proportional to actual physical memory. But, it is still comparable to a system with no ballooning. Content based sharing improved memory overhead a lot. This was because around 55% of the pages shared were zero pages which is true irrespective of the number of concurrent VMs running. As the number of VMs was scaled to 10, the sharing level approached 67%, decreasing percentage of overall VM memory.

What I found confusing:
I was wondering how the use of content based sharing would compare to Disco’s pmap way. Would it be a huge performance difference?

1. Summary
This paper presents the memory management mechanisms and policies employed by ESX Server, a thin VMM designed to multiplex hardware across virtual machines running unmodified guest OSs. The memory management aims at a high degree of sharing across VMs, allowing a greater degree of memory overcommitment. ESX server’s methods are tested across a range of workloads and hardwares. The results suggest the practical viability of these methods.

2. Problem
All VMMs solve the problem of multiplexing HW across VMs. However, ESX aims to solve this problem amidst two constraints - (1) run unmodified guest OSs and (2) Allow a high degree of memory overcommitment. The latter will allow greater utilization of the hardware while still providing agreed-upon resource guarantees. These constraints influence all of ESX’s mechanisms and policies.

3. Contributions
When memory is overcommitted, the VMM needs a mechanism to reclaim space from one VM. Usually VMM swaps pages from ‘physical’ space of some VMs to disk. This may be ineffective as VMM may not have sufficient information about page usage and it is possible for VMM and OS to work counteractively. ESX introduces a balloon module in each guest OS which will periodically communicate with ESX. When ESX needs space from one VM, it will tell this requirement to the balloon driver, which will make a memory request using native OS interfaces. This should kick in the OS replacement policies which will automatically swap out less-used pages. The freed up pages in Memory can be used to relieve memory pressure. ESX uses page-swapping as a backup when ballooning is not possible. Overall memory requirement can be reduced by sharing common pages across VMs. ESX performs content-based sharing using a 64-bit hash to represent content of a page. A random subset of pages are scanned periodically to detect potential for scanning using a hash-table. The sharing is done using COW similar to DISCO. Memory allocations are dynamically varied using a proportional-share policy. Additionally, an idle tax rate is used to penalize those VMs that have more idle pages. Idle page count is measured periodically through sampling. The dynamic reallocation policy kicks in whenever the % of free pages in memory falls below certain thresholds and quickly tries to realign the memory allocations.

4. Evaluation

The paper presents considerable empirical evaluation across a range of HW and workloads. It is uniformly shown that the CPU overheads for ESX methods are negligible. The results also suggest the scope of sharing across VMs and how ESX is able to efficiently detect and share up to 67% of VM memory. The results of dynamic reallocation show the ESX memory management is capable of handling & relieving overcommitment.
5. Confusions
Why should ballooned pages be pinned in memory?
During Windows boot, all pages are zeroed out. How are these pages shared across VMs as each VM will try to write to these pages? (Page 11, Column 1, last paragraph)

Summary
In this paper, the authors present VMware ESX Server, which is a thin software layer just above the hardware that is designed to multiplex resources between commodity operating systems running in virtual machines. They introduce numerous new techniques for more efficient memory virtualization. For page replacement and reclamation, the Balloon Algorithm is described, which allows each guest operating system to select which page to reclaim. Shared memory is identified using content based page sharing, in which operating systems share pages that are identical. Additionally, a new algorithm for reclaiming idle memory is also described, which uses the concept of an “idle memory” tax to reclaim idle memory from VMs underutilizing their memory and give it to VMs needing more memory.
Problem
Running multiple virtual machines with commodity operating systems clearly has many advantages, however, previously in order to perform efficient memory virtualization it was necessary to modify the guest operating system. In VMware ESX Server, a solution for efficiently virtualizing memory is presented without modifying the guest operating system to support it. Additionally, servers typically underutilize their physical resources, which results in wasted resources. VMware ESX Server is a solution that consolidates servers and overcommits memory and processor to more efficiently utilize resources.
Contributions
VMware ESX server presented numerous new mechanisms and policies for more efficiently implementing virtual memory in virtual machines. Firstly, the Balloon algorithm allows the ESX Server to inflate or deflate a guest OS’s memory, which will use the guest OS’s policies for replacing pages. This gives the illusion to the guest OS that the VM simply has less memory, instead of alerting the guest OS that its page had been reclaimed. Another technique called Demand Paging is also used, but only when ballooning isn’t possible. In Demand Paging the ESX Server simply swaps pages to disk without any guest involvement. Another technique for more efficient memory virtualization that was used is content based page sharing. Content based page sharing shares pages between VMs that have exactly identical contents. In order to support efficient page comparison, pages are first hashed and then only compared to the pages that hash to the same bucket. Copy-on-write is also used when a VM needs to write to a shared page. A technique for reclaiming idle memory is also presented. ESX Server uses the concept of an “idle memory tax” which specifies the amount of idle pages that can be reclaimed. When there isn’t much memory, memory will be reclaimed from VMs that aren’t actively using all of their memory. Idle memory is estimated using a polling and sampling procedure at a set rate.
Evaluations
The paper does discuss how their new techniques improved memory virtualization performance. Using shared memory, they found that nearly 5MB of memory could be reclaimed from a VM. Of this 5MB, 55% of it was just from sharing the zero page. They also found that 67% of memory between the same VM can be shared, and the amount of memory that can be shared between VMs grows linearly as more VMs are added. Based on these results, the sharing memory technique can improve memory virtualization with multiple VMs.
Confusions
I was a bit confused on how the I/O page remapping works and what the purpose of it was. While I believe I understand the overall idea behind reclaiming idle memory, the calculating the shares-per-page ratio was also confusing.

Post a comment