Memory Resource Management in VMware ESX Server
Memory Resource Management in VMware ESX Server Carl Waldspurger. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation, 2002.
Reviews due Tuesday, 2/7 at 8:00 am
« Disco: running commodity operating systems on scalable multiprocessors | Main | Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. »
Memory Resource Management in VMware ESX Server Carl Waldspurger. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation, 2002.
Reviews due Tuesday, 2/7 at 8:00 am
Comments
1. Summary
This paper introduces ESX, a VMWare virtual machine layer that can efficiently manage hardware resources. The focus of this paper in mainly in memory management, including ballooning, idle memory tax, content based page sharing and hot IO page remapping.
2. Problem
This paper mainly deals with the problem of how to efficiently manage memory resource of VMM, this includes flexible allocate, reclaim, and overcommit memory resources for multiple guest OS. This is a hard problem since VMM knows little about the details of guest OS. Also, the previous work of virtual machine monitor like Disco still need to modify the guest OS running on VMM for resource management purposes, which is not possible for commercial VMM software. In ESX, this paper tries to make VMM completely independent of guest OS modification.
3. Contributions
This paper mainly introduces the following techniques of efficiently manage memory resources.
Ballooning is to add a pseudo-device driver added to the guest OS. It can utilize the memory management policy that exists in the guest OS, and let guest OS itself decide the importance of pages and how to inflate/deflate pages.
Content based page sharing can share pages across different guest OSes. It uses a hashing function to identify pages with the same content and share them between different OSes, it also has a technique like copy-on-write that only create new pages when there is a write action on the shared pages.
Idle memory page (memory that is not frequently used, or belongs to idle process) is detected by a sampling algorithm in the memory of a guest OS. Then, an idle memory tax algorithm can take memory from virtual machine that have idle pages, and give memory to virtual machines that require memory.
Other techniques include remapping of ‘hot’ (current IO frequently) pages for data transfer. Page can automatically map to low memory, so that some DMA and NIC that can only address lowest 4GB memory can provide better performance.
Also, notice that all these techniques do not need to modify the source code of guest OS, which is very important for using VMM in real world.
4. Evaluation
In the experiment section, multiple experiments are designed to test each technique mentioned in the contribution section and the results are in separate charts. Also, it uses different workloads like database application, and different OSes like Linux or Windows to test the VMM performance. The experiment result is good, and it also proves that their method can be widely used in different workloads and different OSes without modifying the source code of OSes.
5. Confusion
How does content based page sharing deals with hash collision? The paper only claims that hash collision rate is small, but I think it is not good enough for VMM level code. This paper also mentions a chaining? Is there some details about this?
Posted by: TIanrun Li | February 7, 2017 07:06 AM
1. Summary
The paper presents mechanisms and polices for memory reclamation and management in a hypervisor running multiple unmodified Guest VM's in an over-provisioned setting. The author provides introduces three techniques as: page swapping, ballooning and content based page sharing.
2. Problem
While doing memory management with unmodified guests only the guests is aware of pages not in use. This makes it difficult for the hypervisor to reclaim memory from the guests without severely limiting guest performance and even can lead to crashing the guest if the wrong page is (kernel page) is accidentally reclaimed.
3. Contributions
The three mechanisms provided for memory reclamation: page swapping, ballooning and content-based page sharing. The mechanisms are provided as transparent mechanisms, which in this context means that the guest is unaware of their existence. Page swapping is the hypervisor mapping the pages to be reclaimed to a file and taking them out of the physical memory onto the disk. the ballooning method adds additional stress on the overall physical memory availability of the system by provisioning pinned physical pages private to the hypervisor forcing the guest os's memory management to kick in. This method is also made more efficient using shadow page table support in hardware. The final mechanism is to find common pages among guest vms using the contents of their individual pages. This is done by providing a hashing function for each page and a periodic scanning mechanism to scan and combine these common pages in the guests shadow page tables.
4. Evaluation
The author provides a convincing case for using these techniques as well as provide a detailed analysis of their individual and combined affects of these effects on memory throughput and usage in their described environment. They use a balanced set of benchmarks relevant at the time as well as abstract use cases such as similar guests.
Posted by: Akhil Guliani | February 7, 2017 05:48 AM
1. summary
This paper introduced several techniques for memory resource management of Virtual Machine in ESX Server, including cooperative page reclaming, idle memory tax for utilization with isolation guarantee, content-based transparent page sharing,IO Page Remapping and dynamic resource relocation.
2. Problem
Efficiently support virtual machine workload that overcommit memory. This seems a traditional problem for operating system that os can support more virtual memory than physical memory and in most cases it works well because most applications are idle in memory. The problem is more interesting in VMM because again, the knowledge is not held totally by one layer.
3. Contributions
Several novel and elaborate techiques were proposed that counld provide insight in more general cases for resource management system.
a). Cooperative page reclaming: The balooning driver method is neat by exploiting the interface of driver to allocate a page.
b). Memory Sampling and Idle memory: Usage of sampling
c). Content-based transparent page sharing
d). IO Page remapping: tracing, detecting the problem and adjusting mapping at runtime
e). Dynamic resource reolcation
4. Evaluation
The paper evaluated the proposed method using different workloads to domenstrate the effictiveness of each specific techinique.They also included a exxperiment to show the effect of the combination with five windows VMs. I wonder what is the result of real-world workload? For vmware, they had this kind of data.
5. Confusion
a). What abut the CPU overheads of these techiniques?
b). The combination of ballooning driver and content-based page sharing, if balloon free a shared page?
Posted by: Jing Liu | February 7, 2017 04:44 AM
VMware ESX server is a Type 1 hypervisor. This paper discusses memory resource management techniques employed in VMware ESX server to efficiently support virtual machines that overcommit memory and reduce overall memory pressure on the system. Several mechanisms such as memory reclamation, memory sharing and policies for dynamic reallocation of memory are discussed.
Problem:
* Support dynamic memory management among Virtual Machines running on hypervisor without modifying the operating system.
* Memory management from the hypervisor level using meta-level page replacement can be inefficient and may lead to issues such as double paging.
Contributions:
* Ballooning technique makes use of pseudo device driver or kernel module in the guest OS that communicates and cooperates with the ESX server via a private channel. The Balloon module can be controlled by ESX server to "inflate" or "deflate" causing it to increase/decrease memory pressure and hence help in memory reclamation as and when needed. to along with demand paging to reclaim memory.
* Demand Paging
Memory can be reclaimed without requiring the guest OS support by paging out to an ESX server swap area on disk.
* Sharing Memory between VMs running similar workloads to consume less memory
> Content Based Page Sharing:
Pages with identical contents can be shared across VMs. Such pages are identified by using hash value summarizing the page tables. The implementation can benefit homogenous VMs. Standard copy-on-write technique is used to share the pages.
* Share Based Allocation: To enable the VMs to achieve efficient memory utilization while maintaining memory performance isolation guarantees. Shares indicate fraction of total share in the system and thus the relative resource right of a VM. Here VMs are entitled to consume resources proportional to its share allocation.
* Reclaiming Idle memory using Idle memory tax:
> "Statistical sampling approach" is used to determine idle memory.
> The paper introduces the notion of Idle Memory Tax which specifies the maximum fraction of idle pages that can be reclaimed from client.
*Allocation Policies and Admission Control
Different policies for Admission Control and I/O Page Remapping are discussed.
Evaluation:
* The paper presents reasonable good evaluation of the strategies it employs by demonstrating them on different OS's including Microsoft Windows and Linux based VMS and assuming both uniform and diverse workloads for evaluation.
Confusion
1. When techniques such as ballooning are employed in virtual machines on cloud e.g Amazon EC2 ,what kind of SLA's are guaranteed to the users of the VM?
2. The paper mentions that sharing level among identically configured VMs reaches 67%, however isn't it the case that OS security features, and address space layout randomization (ASLR) in particular, can significantly prevent sharing to such extents even with identical OS's running on VMs?
3. The discussion on I/O page remapping is not very clear.
Posted by: Lokananda Dhage Munisamappa | February 7, 2017 04:32 AM
VMware ESX server is a Type 1 hypervisor. This paper discusses memory resource management techniques employed in VMware ESX server to efficiently support virtual machines that overcommit memory and reduce overall memory pressure on the system. Several mechanisms such as memory reclamation, memory sharing and policies for dynamic reallocation of memory are discussed.
Problem:
* Support dynamic memory management among Virtual Machines running on hypervisor without modifying the operating system.
* Memory management from the hypervisor level using meta-level page replacement can be inefficient and may lead to issues such as double paging.
Contributions:
* Ballooning technique makes use of pseudo device driver or kernel module in the guest OS that communicates and cooperates with the ESX server via a private channel. The Balloon module can be controlled by ESX server to "inflate" or "deflate" causing it to increase/decrease memory pressure and hence help in memory reclamation as and when needed. to along with demand paging to reclaim memory.
* Demand Paging
Memory can be reclaimed without requiring the guest OS support by paging out to an ESX server swap area on disk.
* Sharing Memory between VMs running similar workloads to consume less memory
> Content Based Page Sharing:
Pages with identical contents can be shared across VMs. Such pages are identified by using hash value summarizing the page tables. The implementation can benefit homogenous VMs. Standard copy-on-write technique is used to share the pages.
* Share Based Allocation: To enable the VMs to achieve efficient memory utilization while maintaining memory performance isolation guarantees. Shares indicate fraction of total share in the system and thus the relative resource right of a VM. Here VMs are entitled to consume resources proportional to its share allocation.
* Reclaiming Idle memory using Idle memory tax:
> "Statistical sampling approach" is used to determine idle memory.
> The paper introduces the notion of Idle Memory Tax which specifies the maximum fraction of idle pages that can be reclaimed from client.
*Allocation Policies and Admission Control
Different policies for Admission Control and I/O Page Remapping are discussed.
Evaluation:
* The paper presents reasonable good evaluation of the strategies it employs by demonstrating them on different OS's including Microsoft Windows and Linux based VMS and assuming both uniform and diverse workloads for evaluation.
Confusion
1. When techniques such as ballooning are employed in virtual machines on cloud e.g Amazon EC2 ,what kind of SLA's are guaranteed to the users of the VM?
2. The paper mentions that sharing level among identically configured VMs reaches 67%, however isn't it the case that OS security features, and address space layout randomization (ASLR) in particular, can significantly prevent sharing to such extents even with identical OS's running on VMs?
3. The discussion on I/O page remapping is not very clear.
Posted by: Lokananda Dhage Munisamappa | February 7, 2017 04:32 AM
Summary: This paper presents VMWare ESX VMM, a thin abstraction layer between hardware and unmodifiable commodity guest OSs running in VMs, which aims to manage memory efficiently across VMs. It achieves so by allowing VMs to overcommit memory and using variety of memory management techniques such as ballooning for reclaiming pages, content based transparent page sharing to save space and avoid write overheads, idle memory tax to incentivize memory sharing.
Problem: It is not uncommon to have a large server, running multiple guest OSs across VMs with multiple resources underutilized. Memory is an expensive resource and earlier VMMs such as Disco didn’t do a good job of utilizing memory efficiently and additionally, they made changes in guest OSs. These problems motivated author to design a system which didn’t need to swap physical pages to disk, make changes in OS such as transparent page sharing in order to be more resourceful.
Contributions: 1. Uses concept of over-commitment. Total memory configured for all VMs is more than actual total memory of the system.
2. Ballooning – Cajoles guest OSs to release memory using its own native page replacement techniques.
3. Content-based Page Sharing – Pages with same content can be shared. Uses hashing to compute checksum of page which can be used as lookup key for matching similar pages. Once similar pages are found, full content matching is done to find exact page. Uses copy-on-write technique to save memory and write overheads.
4. Idle Memory Taxation: Pages which are idle are charged more than the ones which are active. Most of the time when memory is required by other VMs, it is taken from VMs having high taxes / large number of idle pages.
5. Dynamic Memory Reallocation: recalculates memory which need to be allocated in case system configuration changes such as a VM is added or removed, amount of memory freed by a VM exceeds threshold.
6. Statistical Analysis of Pages: Keeps track of I/O intensive pages (hot pages) and remaps them.
Evaluation: Each new memory management technique introduced was tested for performance. Ballooning was tested by running dbench with 40 clients on linux VM. Author claims that overhead added (1.4% - 4.4%) is not significant. Content based page sharing was evaluated by running 5 VMs and similar OSs. Page reclamation in both best-case scenario (60%) and regular-case scenario (7-33%) is significant. Dynamic reallocation and Idle tax improves system performance by 30%. Overall, author has evaluated the new techniques reasonably well.
Confusion: 1. Can a VM game the system by keeping resources busy unnecessarily to avoid idle memory tax?
2. What happens if balloon driver is disabled?
Posted by: Rahul Singh | February 7, 2017 04:11 AM
1. summary
This paper is about the Memory Management Architecture in VMWare’s ESX bare-metal hypervisor and important its benefits like Memory Over-committing, Idle Memory Tax, Content-based Memory sharing, etc.
2. Problem
Running multiple Virtual Machines on a single host may not utilize resources (especially Memory) efficiently. This paper proposes design to over-commit memory, memory ballooning, memory sharing (without modifying Guest OS). The paper also discusses the policies governing dynamic allocation of memory to Virtual Machines.
3. Contributions
The paper starts by describing the low level memory virtualization design. The ESX server uses one more layer of Address translation to provide Complete Address Space (Starting from zero) to each VM. The Server (running on hardware directly) keeps the mapping of physical address (visible to VM) to machine address in ‘pmap’ table. This virtualized layer of addresses makes it easy to plugin/swap out pages allotted to any VM. This is very important to support Memory Over-committing. To efficiently reclaim Memory, ESX Server relies on balloon driver loaded as module in GuestOS. This balloon influences the Page Replacement Policy of the GuestOS. Lot of VMs running on a single host might be running similar processes. Thus, VMs can share such pages with Copy-On-Write feature. ESX Server employs Content-Based Page Sharing. Thus, every page in the GuestVM is a potential Shared Page. The ESX server maintains a hash-table for all shared pages. This hash-table is looked up before allocating a fresh page. The hash-table contains shared frame and hint frame to store hash of shared page and partial hash of potential shared page. The paper also discusses the higher level policies governing the dynamic allocation of memory to each VM. The authors have devised metrics like ‘Idle Memory Tax’, Priority of VM, Min-Max guarantee as per SLA. The Idle tax helps ESX Server to calculate amount of memory to be reclaimed from a VM. The overall free memory should not fall below lowest threshold. Anything below high threshold forces Server to aggressively reclaim memory from VMs.
4. Evaluation
The authors seem to have measured performance downside of Memory over-committing with virtualized physical address space with real world applications. The authors have also described the memory overhead which seems small. Different parameters influence the performance like Tax Rate in the experiment. The paper has not described the impact of other parameters. Also, the paper has not described the worst case examples.
5. Confusion
Use of Hint frame used in Content Based Implementation of ESX Server was not clear. What exactly is I/O page remapping? Significance of PAE mode?
Posted by: Rohit Damkondwar | February 7, 2017 03:40 AM
Summary
This paper completely focuses on presenting memory management policies and mechanisms in VMware ESX Server. It talks about Virtualization of Memory, memory reclamation techniques, Memory Sharing and Allocation Policies.
Problem
The increase in cheaper shared-memory multiprocessors has led to the advent of Virtualization techniques for efficient utilization of hardware. This comes with several challenges: Isolation between VMs, load balancing, resource management, ability to run commodity OS without any changes and with all this, higher performance with little overhead. This paper provides solutions for memory management challenges through various mechanisms, techniques and policies.
Contribution
The main contribution of the paper is the novel algorithms and techniques for memory management to efficiently handle multiple VMs running on the same machine.
1. Memory Virtualization: To handle GVA to HPA/MA, ESX server maintains per VM shadow page tables. This enables TLB to cache VA to MA translations.
2. Memory Reclamation techniques: Ballooning--A process of faking the memory pressure in guest OSes to reclaim pages allocated to them. This is a clever technique that obviates generic eviction algorithm across all VMs and use the guest OS's memory management algorithms. Idea is to pin required amount of pages in Guest OS so that these pinned pages can be reclaimed by ESX server.
3. Though ballooning works well, it has limitations due to Guest OS's allocation limits and the balloon driver removal. Solution is to fall back on the good old demand paging--swap pages without any guest involvement. A randomized page replacement algo is used.
4. Transparent Page Sharing: Similar to Disco, ESX also shared read-only data and code across VMs. It also marks shared pages as CoW to avoid write contention to shared pages.
5. Content-based page sharing: In this technique, irrespective of any VM, pages are shared based on their contents. I.e. if the contents are identical, then two or more VMs using them can share a single copy. Hash value of pages are used for finding similar pages. Once hash of two pages match, they can be check for identity. Unshared pages are marked for hint to check later for ditry-ness. If not modified since last scan, it might be a potential page of sharing.
6. Resource Entitlement. A share-based approach that represents a relative resource rights among the clients, is used. This ratio help in maintaining SLAs with customers.
7. Idle memory reclaim: Though the ratios are fair, due to idle pages, the lower-share client may suffer in memory pressures. Thus this techniques samples the usage of pages in VMs to evict the idle pages using ballooning and/or swapping (called Idle memory tax).
Evaluation
The author does a good job in providing detailed analysis of the ESX Server on workloads analyzing performance of each of the techniques mentioned above. Starting with balloon performance, the author shows that there exists an overhead of 1.4% to 4.4%. With shared memory, ESX achieves 60% of memory reclaim for identically-configured machines. Most interesting was figure 7, which proves the Idle Memory Taxing advantages: A VM with higher memory utilization can be profited by allocating pages--reclaimed--from Idle VMs.
Confusions
1. How are virtual addresses from different VMs differentiated in TLB (two application may generate same VA) using shadow page tables ? Does it use something like ASIDs in TLB ?
2. Doesn't periodic TLB flushes for sampling idle pages, result in lower performance of the guest OS?
3. It was a bit obscure to understand the "High" and "low" memory issues with I/O page remapping.
Posted by: Pradeep Kashyap Ramaswamy | February 7, 2017 02:52 AM
1. Summary
ESX Server is a virtual machine monitor made by VMWare that runs commodity operating systems without modifications. Some of the mechanisms and policies used by ESX Server include ballooning, content-based page sharing, share-based memory allocation, an idle memory tax, and dynamic reallocation.
2. Problem
Disco required changes to the IRIX operating system to run virtual machines. For example, the IRIX bcopy() routine was modified for Disco to allow virtual machines to share their file buffer cache, a form of transparent page sharing. Furthermore, Disco and other virtual machine monitors did not achieve full utilization of memory, CPU, and other resources, especially when virtual machines requested large amounts of these resources but later remained idle.
3. Contributions
The authors implemented ESX Server as a commercial product. To control the amount of memory occupied by a virtual machine, ESX Server uses a "ballooning" mechanism. A balloon module in the operating system can "inflate" to reclaim guest physical addresses for the VMM, or "deflate" to return memory to the guest. Content-based page sharing is a mechanism in which guest physical pages in different operating systems that contain the same contents can be marked as copy-on-write under the covers, allowing the VMM to reclaim one page.
Share-based memory allocation is a policy in which each virtual machine has a certain number of "shares", and whenever one virtual machine demands more pages, the VMM revokes pages from the client with the fewest shares per allocated page. The idle memory tax is related to sharing, and it allows the system administrator to configure the importance of the number of shares versus the number of idle pages on the revocation of pages from a virtual machine. Dynamic reallocation allows ESX Server to adjust to changes in guest allocation parameters; the system tries to keep the amount of free memory in each virtual machine above a certain threshold.
4. Evaluation
The evaluation in this paper is very extensive. In Figure 2, the authors show that ballooning allows the virtual machine to provide nearly the same throughput as a virtual machine configured to the desired memory size. Figures 4 and 5 show that page sharing allows virtual machines to share large amounts of pages and ESX Server to reclaim large amounts of memory, assuming all virtual machines are running Linux. Figure 7 shows that share-based allocation with the idle tax behaves as expected with two virtual machines. Lastly, Figure 8 shows the effect of dynamic reallocation on memory allocation for multiple virtual machines. The evaluation would have been slightly better if there was more explanation of some of the fluctuations in Figures 7 and 8.
5. Confusion
I did not understand the problem with using 36-bit addresses inside a virtual machine, and how IO page remapping solved this problem.
Posted by: Varun Naik | February 7, 2017 02:44 AM
1. Summary
VMware ESX Server is a hypervisor designed with the goal of efficiently managing the hardware resources among various virtual machines running unmodified commodity operating systems. This paper talks about various mechanisms and policies employed by ESX server for memory resource management.
2. Problem
Memory resource requirements of virtual machines keep on changing dynamically. With static allocation, a lot of resources remain unused which results in systems not achieving their peak performance. Some solutions were provided before but they involved modification of operating systems. This paper tries to tackle the problem of poor resource utilization without altering the operating systems.
3. Contribution
ESX server uses various techniques for efficient memory resource utilization.
Ballooning:
A balloon driver is loaded into the guest operating system as a pseudo-device driver. Balloon can be inflated by taking physical pages within VM and can be deflated by releasing those pages. Whenever server needs to reclaim memory, it can instruct the balloon to inflate. Balloon then informs the ESX server the physical page number of each allocated page.
Content Based Page Sharing:
ESX server tries to find pages with multiple copies by checking their content. Only one copy of such a page is stored in memory and all VMs use same copy as long as they are not modifying it. In case of modification by a VM, a private copy is created for that VM.
Share-Based Allocation and Idle Memory Reclamation:
Different VMs can be allocated different shares of memory based on their priority. This is a fair allocation policy and all VMs resource allocation degrade gracefully in case of overload situations. But this can lead to situations where clients with large shares can be idle and thus will waste memory. “Idle memory tax” was used to tackle this problem by charging those VMs more which hold large share of idle memory. These VMs are given preference to reclaim pages in case of overload.
4. Evaluation
This paper is very well written and presented all the ideas in a clear and concise manner. It has provided evaluations for all the techniques. Ballooning was evaluated using dbench and it resulted in very less overhead (1.4% to 4.4%) which is insignificant as compared to the advantages provided by it. Page sharing performance was evaluated on APEC95 benchmarks and it showed that nearly 67% of memory can be shared and nearly 60% of all VM memory can be reclaimed. Paper also provided experimental results for active memory sampling, idle memory tax and dynamic page reallocation and all of them presented a positive picture of the system.
5. Confusion
a. How is the shadow page table modified when OS on a VM changes guest virtual to guest physical mapping of a page?
b. How is TLB virtualized for VMs. Do all the VMs have some portion in the TLB to use or does TLB hold the VM number and process ID along with guest virtual machine physical mapping?
c. I could not understand I/O page remapping.
Posted by: Gaurav Mishra | February 7, 2017 02:20 AM
Summary :
VMware ESX Server is a hypervisor which efficiently manages hardware resources such as CPU, memory etc among concurrently running virtual machines. This paper focuses on the memory management techniques adopted by the ESX Server, such as, ballooning to reclaim least valuable memory, idle memory tax which ensures efficient memory utilization without compromising performance isolation and content-based page sharing to eliminate redundancy. All these are implemented without any modification to VM’s operating system.
Problem :
Rapid increase in the number of inexpensive shared-memory multiprocessors made virtual machines the best option to reduce server maintenance cost and improve resource management. Previous attempts of virtual machine design, require some modifications to the commodity operating system. Allowing the Hypervisor to select the pages to reclaim, can result in double paging problem.
Contributions
> ESX Server doesn’t require any modifications to the VM’s operating system.
> It uses a new memory reclamation technique called ballooning in which the hypervisor presents the request for pages to the balloon driver loaded into the guest OS, allowing the guest OS to decide which pages are least important and consequently reclaimed.
> content-based sharing allows sharing between all the pages having the same content. This is achieved by computing a hash-value and finding the corresponding match. Sharing can greatly reduce memory consumption and allow higher overcommit.
> memory allocation is done proportionally based on the shares owned by VMs. Memory Sampling is used to estimate the fraction of active pages. Idle memory tax keeps the number of idle pages below a threshold by reclaiming pages upon exceeding the threshold.
Evaluation :
This paper does a good job in evaluating different memory management techniques they propose. The amount of memory shared increases linearly with number of VMs. Also, they make sure throughput is not affected by using sharing. Experiments on various workloads to test effectiveness of memory sampling, idle memory tax and dynamic reallocation also show promising results.
Confusion :
> how I/O remapping works ?
> In the memory sampling technique, how are the three moving averages calculated ? ( What do fast moving average, slow moving average mean here).
Posted by: Pallavi Maheshwara Kakunje | February 7, 2017 01:38 AM
Summary
ESX server is a VMM designed to multiplex hardware resources efficiently among virtual machines running unmodified commodity operating systems using innovative memory allocation, sharing and reclaim mechanisms.
Problem
The paper focus on system memory management problems including:
1. Optimize memory reclaiming by causing guest OS to invoke its own memory management routines.
2. Efficiently (in terms of both overhead and size) share pages among different VMs
3. Identifying and reclaiming idle memory
Contributions
1. Adding extra level of indirection (machine address, physical address, virtual address) and does PPN-to-MPN translation in a transparent manner to VMs.
2. Using ballooning to cause guest OS to invoke its own memory management routines instead of always using forced paging mechanism. Ballooning is also configured to minimize overhead introduced.
3. Used hashing to efficiently match pages with the same content for sharing. Content-based memory sharing can potentially share all possibly sharable pages. Use COW to mark pages and copy a private page on writing.
4. Use statistical sampling to identify idle memory level. Impose idle memory tax to enable both performance isolation and efficient memory utilization.
Evaluation
- I like the authors tend to evaluate in two levels: 1. it works; 2. performance is good. This design of evaluation happened for both memory sharing and idle page management where the author first use a synthetic workload then a real world data load.
- The authors developed interesting and comprehensive evaluation mechanisms to test the results. I found the memory toucher interesting and beneficial to demonstrate the memory sampling mechanism is behaving as intended.
- One of the advantage of the paper is ESX is an available commercial platform, thus many convincing real-world data are provided.
- In Figure 5, the paper demonstrates high percentage of sharing among the same VMs using the same OS where lots of the OS code can be shared. Taylor in our reading group brought a good point that the sharing percentage could be lower if the system is a mix of VMs running different OS.
Confusion
1. Are there protections with ballooning? What if a malicious software takes control of ballooning which potentially put memory pressure on all the VMs.
Why is it beneficial to make address translation transparent to VMs?
Posted by: Yunhe Liu | February 7, 2017 01:23 AM
Summary
The paper explains VMware ESX server, a thin software layer sitting on the top of hardware, which uses techniques for performing virtualization and resource management without requiring modifications or extensive support from guest Operating Systems. It provides various strategies for page reclamations(ballooning), efficient memory utilisation(idle memory tax) and page sharing (content based page sharing) while maintaining performance isolation guarantees.
Problem
Problem of underutilisation of individual servers is solved by allowing servers to be consolidated as virtual machines on a single physical server with little or no performance penalty. Earlier this problem was attempted to be solved required making modifications to the guest operating systems to achieve virtualization which was tedious and introduced potential security risks.
Contributions
This paper introduced several novel mechanisms and policies that ESX server uses to manage memory.
Following are some of the main features :
Ballooning (A technique to coax a guest Operating System into cooperating with the server)
This technique is used to reclaim pages from guest Operating System loading a pseudo-device driver or kernel service. which inflates( making OS swap out a page from memory) and deflate (deallocate previously allocated pages).
Transparent page sharing
This is achieved through content-based page sharing. ESX scans the content of guest physical memory for sharing opportunities. Instead of comparing each byte of a candidate guest physical page to other pages, an action that is prohibitively expensive, ESX uses hashing to identify potentially identical pages.
Idle memory tax
This is used penalize VMs that had a large fraction of inactive allocated pages. The basic idea is to charge a client more for an idle page than for one it is actively using. Idle memory tax value is statistically computed by sampling the VM working set through TLB entries (Active memory sampling).
Evaluations
Authors have extensively evaluated the efficiency of all the features introduced in the paper. Ballooning performance is evaluated using dbench, Page sharing performance results show that up to 60% of the memory could be reclaimed.
Idle Memory Tax evaluation shows 30% throughput increase in experimental setting.
Confusions
It is kind of difficult to draw a line where OS layer ends and hypervisor layer starts. It would be great if it is explained with an example and trace it from application layer through guest OS to hypervisor layer. (maybe a system call to allocate memory in process’s heap)
This will also give a clear picture of how OSes and the hypervisor interact at low level.
How does hypervisor differentiated the mapping of guest virtual address to Host machine address across VMs. (is there any special bit which is maintained for each VM)
Posted by: Om Jadhav | February 7, 2017 01:17 AM
1. summary
This paper described the VMware ESX server that was designed around being able to provide an efficient VM environment that did not require any modification of the guest OS.
2. Problem
The problem that the VMware server experienced was that they did not have any actual control over which operating systems would be run on their system. They hoped to be able to support unmodified operating systems. Memory management was also a major concern as there was often a large amount of idle memory space that could ideally be used by processes that were actually running. They wished to be able to overcommit the memory resources to the host systems without performance losses.
3. Contributions
Without modification of the guest OSes, correctly selecting an idle page of memory to page out can be difficult. They introduced a “balloon” module within guest systems that can be controlled by the ESX server. The balloon module can be signaled to allocate pages, thus forcing the guest OS to choose which old pages should be paged out itself instead of the server selecting somewhat arbitrarily. They also introduced a method of hashing page contents to be able to facilitate shared pages between VMs. When pages have matching hashes, and their true contents match, the server can move them to be a single copy-on-write page allowing for the better over provisioning of memory. An idle memory tax was also included as a method to allow for better allocation of memory between VMs to favor those that are actively using their allocated memory further allowing more pages to be paged out and better overprovisioning.
4. Evaluation
Each of their memory allocation tools were evaluated over a range of use cases and generally shown to improve memory usage. I was surprised to see how much memory was able to be shared and especially that it actually improved performance slightly even with the overhead of checking for equivalent pages.
5. Confusion
When ballooning, can the ESX server be sure that the pages the VM is attempting to page out don’t belong to the balloon module or is it just a generally safe assumption?
Posted by: Taylor Johnston | February 7, 2017 01:08 AM
1. Summary
The concept of server virtualization has rejuvenated because of the introduction of technologies such as server consolidation and inexpensive shared-memory multiprocessors. VMware ESX Server achieves this virtualization by directly managing system hardware resources and running unmodified commodity operating systems, thereby achieving higher I/O performance and controlling complete resource management.
2. Problem
To get the maximum benefits out of statistical multiplexing of resources by server virtualization, the resources should be flexible enough to allow over-commitment while still providing resource guarantees to VMs. This should be achieved along with providing a higher I/O performance. One of the major constraints in implementing this virtualization is that the existing operating system couldn’t be modified.
3. Contributions
The paper introduces some beautiful concepts like Ballooning, Content-Based Page Sharing, Shares and Taxes. Ballooning is a reclamation mechanism which allows ESX Server to provide a predictable performance and overcome the traditional page replacement issues like double paging problem. ESX server maintains pmap datastructures for address mapping and shadow page tables, which supplement the hardware TLBs.
Since transparent page sharing is nearly impossible to implement without making changes to guest OS, ESX Server uses a completely different approach - Content-Based Page Sharing, where page copies are identified by their contents. The paper also explains the optimizations like hashing and storing some metadata in hint frame.
Share-Based Allocation is an algorithm employed by ESX Server in order to provide memory performance isolation without compromising memory utilization. This also introduces the concept of taxing idle memory in order to reclaim it. It also explains statistical method to identify idle memory pages in order to achieve this. The server also provide robust allocation policies in to dynamically allocate memory taking into account the new parameters introduced. ESX server keeps track of “hot” pages which are involved in repeated I/O activities, by maintaining statistics about them.
4. Evaluation
The paper clearly explains most of the concepts in an easy to understand way. Almost all the concepts are backed by empirical analysis. The one area in which the paper is lacking is in answering the “how”s of some implementations. For example, it is not explained “how” the server intercepts VM instructions that manipulate guest OS page tables or TLB.
To demonstrate the performance of Ballooning, the paper presents the results of running dbench workloads. Similarly, the analysis of page sharing performance is also provided for different kinds of workloads. The paper also provides detailed experimental results for memory sampling, idle memory tax and dynamic reallocation.
5. Confusion
1. The default time for sampling i.e. 30s seems long to me. And the sampling happens multiple times. Wouldn’t this result in extremely low to no idle pages? Can you please explain the details of sampling methods to measure the idle memory pages?
2. There is a statement in the paragraph right below the Ballooning diagram – “ When guest ppn is ballooned, the system annotates its pmap entry and deallocates the associated MPN”. I wasn’t able to understand what was being conveyed in this statement and the one following this.
Posted by: Sharath Hiremath | February 7, 2017 01:03 AM
1.Summary
VMware ESX server is a software interface layer which virtualizes and manages the hardware resources and allocates the virtualized resources to various VM’s running on top of it, without needing to alter the operating systems which run inside the VM’s. ESX introduced novel techniques to reclaim the memory from the virtual machine through balloon technique, technique to ensure fair utilization of allocated memory through idle memory tax and content based sharing approach to share memory between the VM’s.
2. Problems
->In the previous effort to virtualizing hardware resources like Disco, the operating system running inside the VM’s had to be altered to manage the resources.
-> While reclaiming the resources from the virtual machines, the hypervisor had to pick the page/s to be reclaimed from the VM based on some policy even though the hypervisor unlike OS had no inside knowledge on which page is idle/less used and could be paged out.
-> problem of double paging when both the OS and the hypervisor is paging out the same page.
3. Contributions
ESX server maintains a pmap for each of the VM which maps the VM’s physical page number to the actual machine page number. It also maintains the shadow pages which maps VM’s virtual address to the machine page number which would be utilized by the hardwares like TLB cache etc.
Main contributions of this paper are as follows:
Ballooning - When the ESX needs to reclaim the memory from the OS, it inflates the balloon device driver which triggers the memory management algorithms in the native OS which inturn gives away pages to the balloon. These pages are then claimed by the ESX.
Content Based Sharing - Pages which have exactly same contents are stored in the same machine page by the ESX. This is implemented using hashing where a page’s contents are used as a key to lookup the hash table, if matched, then the actual pages are compared to see if they match exactly, if they match then a single copy of the page is stored and it is marked as copy on write. Whenever a page is altered in any VM, a private copy of the page is created in that particular VM domain.
Shares and Working Sets - Here the rights to the resources are encapsulated by the shares. The system administrator could set the shares to the each VM, depending on the priority of the VM/client and thus the amount of the resources allocated to the VM depends on the share it holds.
Idle Memory Tax - This technique takes away the idle pages from the VM. It finds the idle page by using the sampling technique.
Evaluation
ESX server is evaluated based on various kinds of production workloads and results show that the novel techniques introduced in the paper makes the applications runs more efficiently with very little overhead in a virtual environment.
Confusion
1.Explain the I/O page remapping part of the paper.
2.Could the ESX read the page tables of the individual VM? (How will the ESX know about read-only pages of the OS, or the pages from which the code has been executed to pick up these pages while scanning to see if these pages can be shared?)
Posted by: Sowrabha Horatti Gopal | February 7, 2017 12:54 AM
1. Summary
The paper describes mechanisms and policies in VMware ESX Server for efficiently supporting virtual machine workloads that overcommit memory. These are designed to work without modifying the guest operating system and are useful for sever consolidation.
2. Problem
Virtualizing memory for operating systems running on a VMM is difficult because the VMM has to manage the underlying machine memory transparently while providing performance isolation. Adding an additional page replacement policy over that of the guest operating system may cause conflicts in their choices, resulting in performance anomalies. Also, existing work on VMMs like Disco had to do minor modifications to guest operating system kernel in order to effectively virtualize them.
3. Contributions
In order to overcome the undesired effect of conflicts in decisions made by page replacement policies of virtual machine monitors and operating systems, a technique called ballooning is introduced. A balloon can be introduced as a kernel service in a guest OS, and then be inflated to increase memory pressure in the virtual machine, or deflated to decrease it. This can be used to make the guest OS cooperate with the policies of the VMM. Ballooning is done with long intervals of time, and demand paging is done as a fallback mechanism when ballooning is not available. Content based page sharing is used to reclaim pages which contain similar contents across virtual machines. Pages that can be shared are identified by hashing the contents of the page and looking up in a hash table of scanned pages. These pages are marked copy-on-write when shared. The idea of share-based allocation is extended using the idea of an idle memory tax which can be used to reclaim pages from a virtual machine which has large amount of memory but is not actively using most of it.
4. Evaluation
The evaluation of the ideas presented in the paper is comprehensive. The ideas are tested on real hardware, with real-life workloads and for long periods of time. The techniques seem to work well and provide significant benefit. My only gripe is that they do not discuss how reliant they are on hardware support provided by the architecture and how easy it is to port these to architectures other than x86.
5. Confusion
How does Physical Address Execution (PAE) work and why is it a problem for I/O transfers?
Posted by: Suhas Pai | February 7, 2017 12:44 AM
1. Summary
The paper talks about memory management in VMware ESX, a type 1 hypervisor. It introduces several novel mechanism and policies - ballooning, content based page sharing, hot I/O page remapping - to manage memory.
2. Problem
In traditional computing environments, memory is often underutilized allowing for server consolidation as VMs on the server with little or no penalty. In such virtualized setup, one should be able to flexibly overcommit memory. From VMware’s perspective, the setup should also allow for VMs to run unmodified OSs which is difficult because of the inability to influence design of guest OS running within the VMs.
3. Contributions
Important contributions are :
i) Ballooning mechanism, which coaxes the guest OS to cooperate with EXS hypervisor in reclaiming pages considered least valuable
ii) Idle memory ‘tax’: This charges the VM client more for an idle page than one it is using which allows efficient memory utilization.
iii) Content-based page sharing, which identifies pages by contents. This eliminates the need to understand guest OS code and more importantly such pages can now be easily shared based on the contents.
iv) Hot I/O page remapping, which maintains statistics to keep track of ‘hot’ pages and thereby remap them to low memory.
4. Evaluation
The effectiveness of ballooning mechanism is demonstrated using dbench benchmark with no significant reduction in performance. ESX’s page sharing implementation is studied by considering effectiveness of memory reclamation and overhead on system performance. The idle memory tax is studied using two VMs. With increasing tax rate, idle memory is reclaimed from one and allocated to other causing performance boost by 30%. Overall, every memory management technique proposed in the paper are evaluated and proved to be efficient.
5. Confusion
I am not sure if I totally understand the I/O Page remapping and the notion of high/low memory.
Posted by: Dastagiri Reddy Malikireddy | February 7, 2017 12:41 AM
Summary
The paper describes various design principles used for memory management in ESX Server, a Virtual Machine monitor used to multiplex hardware resources among multiple commodity operating systems running on a single physical machine.
Problem
The main challenges which the paper tried to tackle were: to efficiently manage the machine memory with minimal or NO changes to the guest operating systems’ kernel code which couldn’t be achieved in earlier popular hypervisors like Disco, provide system administrators the flexibility to overcommit various resources and their fair allocation.
Contributions
ESX server introduced various mechanisms and policies to meet to drastically improve memory usage and server consolidation.
a. To reduce aggregate memory consumption across all the VMs ESX uses the shared pages idea where multiple physical pages map to on machine address. But the implementation differs significantly from Disco. ESX uses page content instead of going into the kernel code and identifying same content.
b. To ensure fairness in distributing resources, ESX employs a share based model to redistribute resources dynamically. VMs with lower number of shares per allocated page are used for reclaiming memory for other VMs. However the number of actively used pages is also considered. A mechanism called ballooning is used for reclaiming memory.
c. ESX added support for remapping pages from high addresses to low addresses. This can be very useful in reducing the data copy overhead during some commonly seen I/O transfers where the I/O devices can use only the low addresses.
Evaluation
The paper evaluates the memory consumption of the system under a variety of loads while gradually adding the various policies described above. While the effects of some policies like memory sharing are very workload dependent others like share based allocation and I/O remapping have clearly stood out in improving the memory usage under different kinds of loads.
Confusion
I’ve a few questions:
a. The hash table used for matching page content which is maintained at the server level has fixed size. However since the size of VMs can exceed this, is there some kind of a replacement policy for the hash table ?
b. To tackle the double paging problem why can’t we somehow invalidate the page table entries for the physical addresses maintained inside the VMs for pages which are paged out by the Server ?
c. Why should the balloon driver poll the server periodically? Is it not easy for the server to notify the driver?
Posted by: Mayur Cherukuri | February 7, 2017 12:21 AM
Summary
The paper describes various design principles used for memory management in ESX Server, a Virtual Machine monitor used to multiplex hardware resources among multiple commodity operating systems running on a single physical machine.
Problem
The main challenges which the paper tried to tackle were: to efficiently manage the machine memory with minimal or NO changes to the guest operating systems’ kernel code which couldn’t be achieved in earlier popular hypervisors like Disco, provide system administrators the flexibility to overcommit various resources and their fair allocation.
Contributions
ESX server introduced various mechanisms and policies to meet to drastically improve memory usage and server consolidation.
a. To reduce aggregate memory consumption across all the VMs ESX uses the shared pages idea where multiple physical pages map to on machine address. But the implementation differs significantly from Disco. ESX uses page content instead of going into the kernel code and identifying same content.
b. To ensure fairness in distributing resources, ESX employs a share based model to redistribute resources dynamically. VMs with lower number of shares per allocated page are used for reclaiming memory for other VMs. However the number of actively used pages is also considered. A mechanism called ballooning is used for reclaiming memory.
c. ESX added support for remapping pages from high addresses to low addresses. This can be very useful in reducing the data copy overhead during some commonly seen I/O transfers where the I/O devices can use only the low addresses.
Evaluation
The paper evaluates the memory consumption of the system under a variety of loads while gradually adding the various policies described above. While the effects of some policies like memory sharing are very workload dependent others like share based allocation and I/O remapping have clearly stood out in improving the memory usage under different kinds of loads.
Confusion
I’ve a few questions:
a. The hash table used for matching page content which is maintained at the server level has fixed size. However since the size of VMs can exceed this, is there some kind of a replacement policy for the hash table ?
b. To tackle the double paging problem why can’t we somehow invalidate the page table entries for the physical addresses maintained inside the VMs for pages which are paged out by the Server ?
c. Why should the balloon driver poll the server periodically? Is it not easy for the server to notify the driver?
Posted by: Mayur Cherukuri | February 7, 2017 12:21 AM
1. Summary
Virtual Machines are one of the most prominent ways to achieve higher utilization for hardware as well as giving full isolation. VMWare continues the pursuit of higher efficiency by extending work by Disco and other type 1 hypervisors. Most notably, they opt to not modify the existing kernels in an effort to generalize the solution and minimize effort required.
2. Problem
Memory is considered to be the primary bottleneck in these systems. To improve utilization, OSes alike overcommit memory as many processes idle. The goal of this work is to identify ways to more efficiently manage memory and gracefully roll with the punches as system memory pressure increases and decreases.
3. Contribution
The first novel contribution is the use of hashing for content-based page sharing. This allows the hypervisor to be significantly more efficient as a direct comparison against every other page is not necessary. The use of the hash function is safe; on detection of equivalent hashes, the pages are compared to verify they indeed are equivalent.
Second, the idea of ballooning effectively tackles the knowledge problem for page reclamation. Traditionally, the hypervisor has little insight into what pages should be reclaimed. By using a driver to pin pages and give them back to the hypervisor, the decision to which page to evict is passed back to the OS. It also removes the chance of double paging where the page that is paged out of memory by the hypervisor is then paged out by the OS.
Third, the use of a statistical sampling of pages better identifies memory idleness and combined with the share system allows the hypervisor to better elastically allocate memory. The combination of moving averages helps the system gracefully absorb memory pressure and smoothly release memory.
Lastly, dynamic reallocation is an important observation and clever hack around poor device support. By allocating in the above 4 GB region initially, memory for actually communicating with devices can be treated as another resource. On frequent use, OS pages can be mapped into that region removing the additional copy to an intermediary page.
4. Evaluation
This paper is one of the clearest papers I’ve read and it is easy to follow the thought process throughout the paper. While ballooning and content-based sharing offer the most insight, it would have been nice to go into more detail with the statistical sampling and dynamic reallocation. Would it be possible to tune the parameters to the moving averages to achieve better realization for specific workloads? And for dynamic reallocation, what happens if we chose different numbers for the states? What is the performance implication of fewer states?
5. Confusion
Is there a situation where I/O remapping can cause worse performance among several VMs attempting to communicate with the same set of devices?
Posted by: Dennis Zhou | February 6, 2017 11:48 PM
1. Summary
This paper described the VMware ESX Server's memory management techniques and takes a look at the evaluation of those techniques on real world systems.
2. Problem
The main problem of that this paper addresses is how to utilize memory over multiple virtual machines. The paper gives specific problems that each of its memory management techniques tries to address. For page replacement this meant trying to get rid of another level of paging and allowing the OS to be able to choose what page to give up. Another problem addressed is finding pages that can be shared without changing the OS, like Disco does, and without causing too much overhead as well.
3. Contributions
The first of the memory management techniques the paper describes is ballooning, which is a method of recovering and returning pages to and from a guest machine. This is accomplished by loading a module into the guest OS which then can inflate by allocating pinned physical pages. The major benefit of this method over meta-level page tables is that ballooning allows the guest OS to decide which pages to give up. The next memory technique described was how the ESX Server implemented sharing pages. The ESX server uses hashing to determine identical pages efficiently and then after further checking that the full pages are identical the machine page can be shared. The paper also describes the algorithm for determining the idleness of pages and uses an “idle memory tax” for determining which pages should be reclaimed.
4. Evaluation
The paper does a good job of evaluating each technique after presenting it. For ballooning this meant measuring the overhead of their technique on the throughput, which turned out to have a seemingly reasonable maximum of only 4.4%. However, for the first shared memory evaluation the choice of the best case homogenous VMs other than that with a specific VM setup you could achieve a maximum of 67% sharing. The second evaluation using real-world setups that could achieve over 40% sharing and had no significant overhead was the impressive portion.
5. Confusion
I was confused in the idle memory tax portion of the paper and why the authors used a 30 second period for sampling which seems to be a very long time.
Posted by: Brian Guttag | February 6, 2017 11:41 PM
1. Summary
A very joyful read. This paper is an overview of VMware's ESX server which is a hypervisor layer designed by VMware to multiplex virtual machines on a single physical machine. It presents some very interesting techniques for more efficient memory management and memory sharing for virtual machines. That includes ballooning for efficient reclamation of pages, idle memory tax for better memory utilization by the VMs, and content based intra and inter VM page sharing.
2. Problem
Memory management for guest operating systems running atop shared physical hardware is challenging. The memory demand for the VMs can dynamically change. This may increase memory pressure on some of the VMs while at the same time memory for other VMs may be underutilized. Moreover, since the VMs might be running the OS or even running the same applications, there will be multiple redundant copies of the same code and data in the machine memory causing poor resource consumption. However, the fact that the hypervisor that has to do this actual management is blind to the OS level memory policy and operations makes this more challenging.
3. Contributions
ESX server uses ballooning to dynamically reclaim pages from the target VM. A balloon is a module that is installed as a driver in the guest kernel and has a private channel to communicate with the ESX server. The reclamation is performed according to the guest OS's reclamation policy. To reclaim pages, ESX asks balloon module to inflate. The guest VM assigns physical pages to the balloon to fulfill that request by either assigning a free page or by reclaiming some physical pages from its own memory. The balloon tells those physical page numbers to ESX which can then look into its pmap to locate and reclaim corresponding machine pages.
Another interesting technique ESX uses is the content based sharing of machine pages. A background service computes the hash for the contents of each page and pages which have the same content are marked COW and are shared between by modifying the entries in the pmap. This sharing can be inter or intra vm and saves a lot of memory.
Another very interesting technique used in ESX is the reclamation of idle memory. This is achieved by using a parameter called idle memory tax and penalizing each vm in proportion to that tax. To measure idle fraction for VM's pages, ESX randomly selects a small number of physical pages for a VM and invalidates a TLB entries for those pages. The next access to that page will result in a hypervisor trap hence a count for those physical pages can be established and incremented. Count for idle pages would be null or close to null. This would give the fraction of idle pages for a VM.
4. Evaluation
The evaluation of this paper was quite comprehensive. For each of the major techniques discussed experimental results for appropriate benchmarks were shown to prove its effectiveness. For example, In section 3, the authors show that the overhead of ballooning decrease the VM throughput very little. Similarly for page sharing, authors show that amount of saved memory increase linearly with the number of identical linux VMs. This is because of a high amount of identical code content since the OS is the same. To prove their point the results for memory saved with page sharing under a real world load with heterogeneous guests are also shown.
5. Confusion
Will every guest need to install balloon? what if a VM runs without it?
Why does Xen paper has so many more citations than this?
Papers from industry are always easier to understand (In my very limited experience). E.g., Map-Reduce, GFS, ESX., and a few more. Why?
Posted by: Hasnain Ali Pirzada | February 6, 2017 10:51 PM
Summary:
VMware ESX server is a thin software layer designed to multiplex hardware resources efficiently among multiple virtual machines running unmodified commodity operating systems. The paper presents novel memory management mechanisms and policies employed in ESX server design.
Problem:
Many VMMs modify guest OS code or rely on traditional paging scheme to virtualize memory among virtual machines, which has lot of overheads. VMMs should be able to efficiently utilize memory while still providing performance isolation guarantees without any modifications to guest operating systems.
Contributions:
VMware ESX server directly controls system hardware as compared to other VMware products which run on top of a pre-existing operating system. Following memory management mechanisms are used in ESX server design:
Ballooning: A small ballon module is loaded into the guest OS as a pseudo-device driver. When a server wants to reclaim memory, it instructs the driver to “inflate” by allocating pinned physical pages within the VM, using native interfaces and vice versa. System falls back to paging mechanism when ballooning is not possible.
Content based sharing: Pages are shared based on their contents among VMs. A hash value that summarizes a page’s content is used as a look up key into a hash table containing entries for other pages that are already shared (or marked as COW).
Idle memory tax: Proportional-share algorithms to allocate memory to VMs are ineffective due to their lack of knowledge of active memory usage. Idle memory tax solves this problem by allowing pages to be reclaimed from clients not actively using their full allocation.
I/O page remapping: ESX server maintains statistics to track “hot” pages in high memory that are involved in repeated I/O operations. After a certain threshold, “hot” page is remapped to low memory.
Evaluation:
The paper provides detailed evaluation of the different techniques presented in the paper. Ballooning introduces an overhead of not more than 4.4% over VMs with no ballooning. Content based sharing among 10 VMs running homogeneous workloads allows reclamation of nearly 60% of all VM memory. Dbench workload benefits significantly from additional memory available due to idle memory tax scheme. Dynamic memory reallocation policies are evaluated by running five VMs on a system with more than 60% overcommitted memory.
Confusion:
I did not quite understand the statistical sampling approach to estimate aggregate VM working set.
Posted by: Neha Mittal | February 6, 2017 10:41 PM
1) Summary
The authors share insights and design decisions from the VMware ESX VMM. In particular, they focus on memory management mechanisms and policies which they use. They describe ballooning for coaxing guest OSes into releasing memory, content-based page sharing for reducing redundant pages in memory, and the idle-memory tax for efficient memory allocation.
2) Problems
Previous work such as Disco never fully accomplished the goal of transparency to an unmodified guest OS. The authors often resorted to tricks and hacks and minor changes to the guest kernel. However, in ESX's commercial environment, this is not feasible; guests must run completely unaltered.
Moreover, while Disco's goal was to run commodity OSes on new hardware, such as NUMA systems, ESX's goal is to efficiently consolidate servers to improve server utilization. To this end, ESX must be able to efficiently overcommit and allocate/reclaim resources for multiple running server workloads in a commercial environment.
In particular, the authors of this VMware paper describe a few problems regarding memory management. VMM's have to reclaim and allocate memory without any knowledge of the underlying importance of the processes being allocated memory. For example, the VMM has know way of knowing whether an idle process is running or an important computation, but it still has to decide who to allocate memory to. Moreover, when the VMM is under memory pressure, it must reclaim memory, but has no knowledge of which pages would cause the least performance reduction or which pages are not in use.
The authors also mention the double-paging problem, in which the VMM swaps out a page and the guest pulls the page back into memory only to swap it back out again. The underlying problem here is that often, the goals of the guest OS conflict with the goals of the VMM.
Finally, sharing pages between VMs for efficiency is difficult because the VMM has no semantic information about the pages. Disco resorts to guest OS modifications, but this is undesirable.
3) Contributions
The authors describe three memory management techniques and two policies.
First, they describe their ballooning technique as a way to coax the guest OSes to release memory. This decreases memory pressure on ESX. More importantly though, ballooning allows ESX to make reclaimation decisions without acting blindly. In fact, it is a clever way to make guest OSes choose which pages to release themselves; the ESX server makes guests feel its own memory pressure.
Second, the authors describe content-based page-sharing as an optimization to reduce memory pressure. They find that a significant amount of memory is actually the same across different VMs. By sharing these pages, memory is freed for other uses.
Third, the authors describe a sampling technique for measuring the number of idle pages in a guest's memory space without guest modifications. This information is used by policies to make decisions on how to allocate and reclaim memory.
Fourth, the authors describe their idle-memory tax policy. Traditional share-based allocation can allocate the same amount of memory to an idle VM as to one under "physical" memory pressure. The idle-memory tax is a technique for taking memory from VMs that don't need it and giving it to VMs that do need it but without violating share percentages.
Fifth, the authors describe an intuitive threshold system for triggering ballooning and demand paging, which they show to be sufficient for managing memory pressure.
4) Evaluation
This paper is really well-written, in my opinion. It is concise and its modular sections follow a clear line of thought from lower-level mechanisms to higher-level policies. Also, throughout the paper, the authors show how they maintain their "no modification" goal while demonstrating how they solved various problems.
Their evaluation sections are pretty straightforward. They demonstrate all of their techniques and mechanisms in action. Also, their test workloads seem reasonably representative of common workloads (e.g. databases, web servers) and operating systems (e.g. Linux and Windows).
Perhaps one flaw is the unmentioned assumption that hardware provides the functionality of an x86 processor. It seems doubtful that this assumption is very important, but it is never addressed, so we don't know.
5) Confusion
a) How is this paper so well-written?
b) There was some minor confusion about how hash collisions are handled in the shared-memory section.
Posted by: Mark Mansi | February 6, 2017 10:37 PM
1. summary
This paper introduces VMWare ESX Server, a type 1 VMM and focuses on its memory management techniques.
2. Problem
1. Servers in many cases are underutilized, allowing them to be consolidated as virtual machines on a single server.
2. Traditional method to reclaim memory from VMs is to swap some VM physical pages to disk, but this method requires VMM to make page replacement decision uninformed of guest OS’s page replacement policy, which is likely to cause interference
3. Disco’s transparent page sharing requires modifying guest OS to identify redundant pages, which is not practical for running commercial OS.
3. Contributions
1. In order to avoid meta-level page replacement policy’s interference on guest OS’s replacement policy, ESX uses the ‘ballooning’ technique. VMM loads a balloon module into each guest OS as a pseudo device driver or kernel service without interface within the guest. When the server wants to reclaim memory, it controls the balloon module to inflate by allocating guest pages and pinning them in memory and thus the machine pages backing the pinned pages can be reclaimed by ESX VMM. The ballooning module polls the VMM once for a target balloon size. When ballooning is inefficient, VMM falls back to paging method.
2. To avoid modifying guest operating system, ESX uses content-based page sharing. The VMM scans guest pages randomly and computes a hash value to summarize the contents of the page. The hash value is used as a lookup key in the global hash table for potential matches and a successful match is insured by a full comparison. Shared pages are marked as COW, and unshared pages are tagged as hint entries, which are removed on page modification.
3. ESX uses shares-based allocation policy combined with idle memory tax to achieve efficient memory utilization while ensuring performance isolation. A client consumes resources proportional to its share allocation while maintaining a minimum resource allocation. When memory reclamation is required, the client with the least shares per page is chosen as victim and idle pages are charged more than active pages to improve memory usage. ESX measures idle memory percentage through statistical sampling. Every 30 seconds, ESX shoot down TLB entries of 100 pages and monitors their usage, as a representative of the whole memory.
4. Evaluation
Evaluation is very convincing in this paper. Experiments in this paper are well designed to test specific features separately, using representative benchmarks like spec and real-world workloads.
To test the effectiveness of ballooning, multiple memory intensive workloads are run on a single VM with memory ballooned to certain sizes and ballooning tracks non-ballooned throughput closely. To test content based page sharing, authors use multiple identical OSes as VMs and run spec95 benchmarks and around 67% of aggregate virtual memory is shared. Tests on real-world workloads also show sharing ranging from 7.2% to 32.9%. Besides, the paper also evaluates the effectiveness of idle memory sampling and idle tax with a toucher application and identical systems (one idle) with different tax parameters.
5. Confusion
I don’t understand why I/O remapping could be a problem in VMM (Section 7)
Posted by: Yanqi Zhang | February 6, 2017 10:29 PM
1. Summary
This paper talks about the memory management mechanisms and policies used by VMware ESX Server, which is a bare-metal hypervisor to support server consolidation and efficiently manage memory between VMs. The techniques described include ballooning to reclaim pages from VMs, idle memory tax to adjust for idle page usage, content-based page sharing and I/O remapping.
2. Problem
Server consolidation via overcommitment of memory and other resources is done to improve utilization for SMPs. However, most of the solutions prior to this paper required modifications to the guest OSes, which was not a feasible business option for VMware. Also, the memory management techniques used in previous solutions were inefficient. Incase of overcommitting of memory, the host OS was unable to make informed decisions about the guest OS pages to be replaced. Also they did not account for idle memory pages which resulted in suboptimal allocations.
3. Contributions
1. The ballooning mechanism to force the guest OS to give up pages using its own memory management techniques. This allowed more informed decisions about page replacement candidates, which the host OS would be unable to do.
2. Content-based page sharing, which made it possible to share memory pages without modifying the guest OS code like Disco had.
3. More accurate memory allocations through the use of idle memory tax. This was done through estimation of the working set of pages via sampling.
4. Dynamic reallocation policies invoked during system changes or the crossing of pre-defined thresholds in free memory.
5. I/O page remapping to reduce overheads while dealing with 32-bit PCI interfaces which could only deal with the lowest 4 GB of memory.
4. Evaluation
The biggest strength of this paper is its detailed evaluation, where experiments are performed for each technique individually to isolate their individual impact on memory usage. The ballooning technique is shown to be helpful in reclaiming memory and its overhead is not more than 4.4%. Content based page sharing across 10 VMs is shown to result in about 67% of VM memory being shared. Experiments show that their sampling-based working set computation is quite close to the actual active memory being used, and that the idle tax helps boost performance. Dynamic allocation policies for varying parameters and crossing of free memory thresholds is performed, and detailed results are provided.
5. Confusion
1. When are the page scans for the content-based page sharing mechanisms performed? The paper hints that they can be done when the system is idle. When do modern hypervisors do this?
Posted by: Karan Bavishi | February 6, 2017 09:26 PM
1. Summary
This paper introduces memory management methods of virtual machine monitor (VMM) used in VMware. The key features include ballooning, content-based page sharing, share-based allocation and hot I/O page remapping.
2. Problem
The main problem is how to implement VMM in a production environment. From this perspective, we are not allowed to use method such as in Disco to manually change some code of guest OS. With this restriction, the authors present methods to tackle problems in memory management including page replacement, sharing and allocation.
3. Contributions
For page replacement, the main problem is how to use guest OS's existing page replacement method efficiently. Implementing sophisticated page replacement method at the VMM level may introduce performance degradation. The major reason is guest OS's existing policy may have unintended effect with VMM's policy such as double paging. To better leverage guest OS's memory management module, VMM installs a pseudo-device driver or kernel service (balloon) into guest OSs. When VMM wants to reclaim memory, it lets that driver/service (balloon) to ask more free memory (inflation) from guest OS. Inflation may lead guest OS invoke its own memory replacement module, guest OS chooses a "physical" page to swap to its virtual disk, which in effect VMM will swap that "machine" page into real disk. The efficiency of ballooning replies on VMM will not allocate real machine pages for those "physical" pages balloon gets from guest OS. This is an elegant method for VMM to control guest OS's behavior with little performance overhead (Figure 2).
For page sharing, the main problem is how to share memory among guest OSs. The authors realize the essence property for sharing is to have same content. Then a routine method (hashing) is used to identify pages with same content. Use traditional copy-on-write method, the page table entries for these shared pages are marked with copy-on-write bit, later a write request to the page will generate an exception to let VMM allocate and copy a new page for it.
For page allocation, the main problem is how to enforce QoS to different guest OSs (clients) from production perspective. The authors model the problem in a economics way (min-funding revocation). The client with minimal shares-per-page ratio will be asked to relinquish memory. The authors also distinguish idle memory (long time no use) from active memory. Idle memory will charge client more (idle memory tax), for the goal of better balance between performance isolation and efficient memory utilization in the whole system. To measure the ratio of idle memory per client, this paper uses a sampling method to randomly select n pages, invalidates mappings (for trapping into VMM when accessing these pages), and counts for each page in VMM level. Several methods in this paper use randomization (page sharing scanner, idle memory ratio, ...), which may indicate randomization sometimes simple and efficient enough in production use.
4. Evaluation
The authors conducted performance experiments on balloon (Figure 2), page sharing (Figure 4, 5), memory sampling (Figure 6), idle memory tax (Figure 7), and dynamic reallocation (Figure 8). Most experiments are conducted with different guest OSs including different versions of Windows and Linux. The result shows the techniques presented in the paper bring efficiency while only introducing little overhead. I like the data (Figure 5) they share in production deployment, which convince the audience more easily than synthesized experiment.
5. Confusion
What are current research or industry problems for virtual machine monitor?
Posted by: Cheng Su | February 6, 2017 08:13 PM
1. summary
This paper presents the novel mechanisms and policies for memory management in VMware ESX Server.
2. Problem
The memory usage is of low efficiency, and there is space for improvement
- Underutlized servers can be consolidated as virtual machines on a single physical server.
- Many small servers can be consolidated onto fewer larger machines.
How to run existing OSs without modifications?
3. Contributions
The contribution of this paper is the design of memory management for VMware ESX Server, which contains several novel mechanisms and policies boosting the efficiency of memory usage. high-level resource management policies compute a target memory allocation for each VM based on specified parameters and system loads. lower-level mechanisms reclaim mem from VM.
Six aspects of memory management are as follows, from bottom to up,
1) memory virtualization
Provide ugest OSs with a zero-based physical address space illusion. Add an extra level of address translation maintain a pmap data structure for each VM, translating "physical" page numbers (PPNs) to machine page numbers (MPNs).
2) mechanisms reclaiming memory
Support overcommitment of memory. Each VM is given fixed size memory (maximum size when not overcomitted). When overcommitted, ballooning technique is used to reclaim memory.
3) conserving memory (sharing identical pages)
Using content-based page sharing, pages with identical contents can be shared. Only one copy is stored in a hash table.
4) proportional-share allocation algorithm
It is a tradeoff between performance and quality-of-service guarantees. Resources are encapsulated by shares, owned by clients. Clients consume resource proportional to its share allocation. Memory is revoked from client with the smallest adjust shares-per-page ratio.
5) high-level allocation policy maintains minimum amount of free memory. There are four states,
hight state(6%): no reclaimation
soft state(4%): ballooning
hard state(2%): paging, forcibly reclaiming memory
low state(1%): paging, and blocking the exceution of all VMs accessing the memory.
4. Evaluation
This paper evaluates all mechanisms and policies in Section 3-6 (reclaiming memory/balloon technique, conversing memory, proportional-share allocation algorithm, high-level alloction policies) and the results are as expected. However the experiment is not convincing, there are too few experiemnts and the experiments lacking comparison. I doubt whether these results are reproducible and good in other test cases.
5. Confusion
Section 2 Mem virtual: "VM instrcutions that manipulate guest OS page tables or TLB contents are intercepted". processor maintain shadow page tables. Does this really not introduce overhead? Where does the OS's original page table go? Are the VM instructions more complex (including PPN to MPN translation).
7 I/O page remapping. What the problem is? How is it solved in detail? What is the implementation of remapping the page into low memory?
Posted by: Huayu Zhang | February 6, 2017 03:18 PM
Summary:
I like to think of the ESK server as an OS for OS’s, where different guest OS’s behave like processes and the server behaves like an OS with just one responsibility- managing I/O -which means managing memory. The writers of this paper have designed a system that runs different Operating systems with each of them running like they’re the only system running. It’s different from a VMware Workstation- a read() system call to VMware workstation, invokes a linux read call under the hood whereas the ESX server is directly managing I/O for the linux read call.
Problem:
So the problem is memory allocation/freeing.
1) How does server know which pages to reclaim, when there is a shortage of machine memory? Especially as the server supports overcommitting! The guest OS really knows which pages are important which of them are being used. If the server reclaims important ones there is a problem. The server cannot just keep track every page that the OS thinks is important. There are so many OS’s running.
2) How does the server share memory across different operating systems.
3) How does it decide policy for which OS gets how much memory?
Contribution
1) Ballooning: The server loads a small balloon module into the guest OS as a pseudo-device driver or kernel service. When the server knows there is a shortage of machine memory, the server inflates the balloons in the respective guest OS’s. The guest OS gets alerted and it gives up pages by swapping the less important ones to disk. So the server doesn’t have to do any extra work for tracking. In case the balloting service went down, the server still can reclaim by paging out an ESX server swap area on disk
2) Content based sharing. If two OS’s have a pages worth of memory that have the same memory then they are just pointed to the same machine address. The server uses a hash based approach- where it stores hash values for each page. If another page has the same hash value then it’s referred here. A single global hash table contains frames for all scanned pages, and chaining is used to handle collisions. To track how many OS’s are using a page it just has a reference count.
3) It uses a share based approach with idle tax. Each guest OS has a share which sort of represents it’s priority, but just cos an OS is high pri, doesn’t mean its using all its pages. It also has an idle tax- charge a client more for an idle page than for one it is actively using. When memory is scarce, pages will be reclaimed preferentially from clients that are not actively using their full allocations. This way it guarantees we don’t pick on the lower priority OS’s. The server also has a random sampling algorithm for tracking idle pages. “ Each sampled page is tracked by invalidating any cached mappings associated with its PPN, such as hardware TLB entries and virtualized MMU state. The next guest access to a sampled page will be intercepted to re-establish these mappings, at which time a touched page count is incremented.
Evaluation and Confusion:
I have never read about something like this before. The results read well and overall i liked the paper a lot!
I haven’t thought of a why yet, but they claimed content based memory sharing didn’t have memory or CPU overhead. That seems so surprising to me, that there is that much similarity between workloads across OS’s. I would imagine that the content in pages is constantly changing. If some page was previously shared and now it’s change there needs to be a copy. I find it amazing that this does not produce overhead.
Posted by: Ari | February 6, 2017 02:57 PM
1. Summary
VMware ESX server is a hypervisor that lives directly above the hardware layer. This paper discusses the mechanisms that allow it to effectively manage memory for VMs running unmodified operating systems.
2. Problem
Previous VMMs such as Disco or VMWare Workstation make modifications to OSs to run them in VMs. VMware ESX server has the goal of leaving the hosted OS completely untouched. This makes resource management tricky. Specifically, ESX server wants to allocate memory resources to different VMs based on utilization and QoS goals. It also strives to perform transparent page sharing as in Disco.
3. Contributions
This paper contributes the idea of having a “balloon” process running in hosted VMs to occupy memory space and reclaim pages for the VMM. This is an elegant and flexible way to steal memory back from a hosted OS, although it does have some flaws.
They also generalize the Disco idea of sharing pages. Instead of examining addresses as in Disco, they identify shared pages based on the data they contain. This feels like a crazy idea, but they implemented it in their production system and have real data to back up its effectiveness (figure 5).
Finally, they provide an implementation of a practical algorithm for balancing QoS and idle memory reclamation.
4. Evaluation
ESX server has the advantage of being a product in production use, so all of the data is extracted from running workloads on real machines. There is also a good variety presented: tests are run on different hardware and using a variety of hosted OSs.
They use synthetic workloads to illustrate the effectiveness of a couple of their mechanisms, such as the balloon process and the idle memory checker. These are good proof-of-concept tests that illustrate the ideal behavior of these mechanisms.
In addition to these synthetic tests, they include data from the SPEC benchmarks to illustrate their page-sharing mechanism. It seems like they prepared this test to be optimal for their mechanism; it likely involved many VMs running the exact same OS and benchmark in lockstep. They subsequently report similar data in real machines running real workloads, which makes the SPEC results feel misleading. Why not just report the real data?
To that end, they also present a lot of data from machines running a variety of real workloads, which is pretty cool.
5. Confusion
I was lost by the discussion on I/O page remapping (section 7).
Posted by: Mitchell Manar | February 5, 2017 05:00 PM