CS 736 Reviews - Spring 2016: Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures

« Memory Resource Management in VMware ESX Server | Main | Practical, transparent operating system support for superpages »

Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures

R. Rashid, A. Tevanian, M. Young, D. Golub, R. Baron, D. Black, W. Bolosky, and J. Chew. Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures. SIGARCH Comput. Archit. News, 15(5):31-39, 1987.

Posted by Michael Swift on February 9, 2016 05:07 PM | Permalink

Comments

Summary

Mach provides sophisticated virtual memory features without being tied to a specific hardware base. These rich set of features as stated below come without any performance impact.

1) Sparse virtual address space
2) Copy on write operations
3) Copy on write and read write sharing
4) Memory mapped files
5) User provided backing store

Problem
An intimate relationship between memory architectures and software made sense when each hardware box runs its own manufacturers operating system. As the community move towards UNIX style portable systems software and complex virtual memory management this makes less sense. Operating Systems suffer from a proliferation of distinct memory structures which results in limited Virtual Memory Management other than simple paging support. The hardware dependences like the fixed page size , page table walk support not only affect portability but also lead to a compromise in virtual memory management features.

Contribution
It proposes a design of a memory management system which is easily portable on distinct uniprocessor and multi-processor computing engines. The main contribution in my opinion was the concept of "pagers" which help Mach to handle page faults and page out requests outside of the kernel. However, its benefits have not been clearly evaluated in the paper. This is because , page faults are one of the costliest in terms of both performance and hardware dependency.
The Mach proposes a redesign of the memory management data structures into "machine-dependent" and "machine-independent" to make the machine dependent part a single abstraction.
Machine independent:
1) Resident Page Table
2) Virtual Memory Object
3) Address Map
Whereas the only machine dependent part is the "PMAP- hardware defined physical address map".
This also leads to support of a page size which is configurable at run time since all the data structures are independent of it. The PMAP can eventually do the conversion of actual hardware mapping.

Evaluation
The claim is that this approach to memory independent virtual memory management has been achieved without impacting performance. To evaluate this Mach's performance was compared with traditional UNIX system on various architectures like RT PC , VAX and SUN 3/160 . However, the workloads taken are unclear. From what I understood, 1) evaluation was done for a series of operations lke zer fill 1K , read files of different sizes 2) compiling entire Mach kernel and 13C programs on VAX 8650 using fixed amount of memory under both Mach and UNIX. The performances were comparable for this workload.
i) One of the primary changes in Mach wrt traditional OS is in design and management of memory dependent and independent data structures. An evaluation of increase in memory due to these?
ii) A break up of comparison time taken on a page-fault, TLB miss in both would have made the evaluation more concrete.
iii) Performances for varying size applications , both long and short workloads with varying sizes of memory requirement could be included for a more detailed evaluation.

Confusion:
All communication inside the kernel happens using message passing and ports. Is this good or bad?

Posted by: Vishakha Dhelia | February 11, 2016 08:55 AM

1. Summary
The paper presents the design of virtual memory management within their portable multiprocessor Mach Operating System. The goal is to make memory management as machine independent as possible - by restricting the machine dependent layer to a ‘single code module and related header file’ without sacrificing performance performance.

2. Problem
Most operating systems either had to be significantly fine tuned to suit the underlying hardware making portability a major issue, or in the case of generic operating systems, were unable to make use of the complete functionality or optimizations presented by the hardware.

3. Contributions
The primary contribution paper is the minimization of machine dependent code in the Operating System. This is achieved by allowing machine independent memory objects which utilize boot-time configurable ‘physical’ page sizes which can actually map to multiple machine physical pages under some restrictions. As with other operating systems (at least from the future era), Mach supports copy-on write and read/write sharing of memory and thanks to its machine indepent page sizing, can send, over its message network, entire address spaces as a whole, without actually copying any data making memory sharing very efficient.

The address map, a sorted linked list mapping memory objects into the address space, allows efficient page fault handling, (de)allocation of address ranges and copy of address spaces without penalizing larger address spaces. Another advantage is that the all backing store is implemented by Mach memory objects simplifying memory management. Also, page faults and page out can be handled outside the kernel in Mach. Mach also improves sharing of memory by the use of shadow objects for holding modified pages and a sharing map which points solely to shared memory objects.

4. Evaluation
In general, the paper dives deep into implementation details of the machine independent memory management system, without clearly highlighting how much significant machine dependent management actually exists. Other than the page size restrictions and fixed sizing of hardware structures to physical address lengths, it is tough to garner other significant machine dependent memory management operations.

The paper’s evaluation itself seems hardly sufficient to support their claims. The only results shown are latencies of stand-alone operations as well as compilation overheads. In order to substantiate their claims, we would expect at least some full fledged running of benchmarks with high memory bound and a lot of system calls.

While the paper talks about some significant evaluation on multiple architectures and comparison to UNIX etc. there isn't enough actual results presented.

5. Confusion
What all machine dependent operations/limitations of virtual memory management existed (other than page size limitations)? In other words, more details on the motivation.

How is copy-on-write virtual copy different from normal copy-on-write?

How exactly is virtual address space being increased? If the initial limitation was to do with page table size couldn’t inverted page tables solved the problem (if they existed in ‘87).

Need some details on the baseline operating system in that era (in comparison to current OS), so as to get an idea of how useful this work is in that context.

Posted by: Gokul Subramanian Ravi | February 11, 2016 08:50 AM

1. Summary
The paper describes the design and implementation of Virtual memory management in the Mach OS. This OS was designed with virtual memory management methods to work on a wide variety of underlying architectures. Machine dependent component of MAch virtual memory is usually just a single code module. The authors claim that this separation of hardware and software techniques does not sacrifice system performance, provides portability and helps compare various hardware memory management schemes.

2. Problem
OS portability suffered due to large diversity of memory management techniques for each architecture. Porting to new hardware required redesign of virtual memory management techniques. UNIX had addressed the OS portability issue by restricting the facilities and basing implementations for memory management architectures on previous systems.

3. Contributions
The main contributions of Mach OS was to design a memory management system that would readily be portable to both multi-processor computing and traditional uni-processor computing. Mach provdies Unix 4.3 compatibility and also extends UNIX notions of virtual memory and IPC. The abstractions in design of Mach are task, thread , port, message and memory objects. TAsk is an execution where threads run containing paged virtual address.Thread is the basic unit of CPU utilization. Port is a queue for messages, essentially a communication channel. Messages are collection of data objects used for inter=thread communication while memory object is collection of data mapped to address space of task. The key design feature of Mach is that virtual memory management can be integrated with this message oriented communication facility. Mach also permits copy on write and read/write sharing of memory. For memory management, mach uses pmap which is a hardware defined physical address map and resident page table for machine independent pages. Additionally maintains a list of address to memory object mapping and a memory object itself is a unit of backing storage.

4. Evaluation
The authors test portability of Mach on a variety of different architectures. The porting operation was didn’t take much time as compared to redesigning virtual memory for the hardware. Performance analysis of Mach was done by running some benchmarks performing memory operations and comparing with Linux. The authors also presented compilation performance with Linux. However, the benchmarks tested against do not exactly specify how exactly virtual memory mechanisms were measured and compared. The paper does not explain the experiment for evaluation well enough.

5. Confusion
The exact functions of resident page table and address map in Mach. How exactly are page faults handled and what functionality do pager tasks provide?

Posted by: Anshul Purohit | February 11, 2016 08:34 AM

1. Summary
This paper describes a memory management system that uses higher-level abstractions and uses indirection so that it requires very little machine-dependent code. The paper claims that this increases operating system portability.
2. Problem
When this paper was written, the memory management in most operating systems was very low-level, presenting abstractions that were very close to the hardware. The memory management subsystem was also typically very tied to the design of the hardware, making it hard to port across different physical platforms.
3. Contributions
This paper contributes a memory management system, which conceptually uses message passing, that allows the operating system to ignore (mostly) how the physical memory is actually designed. It uses inheritance values to indicate how memory should be shared with a child process. Most memory management code uses machine-independent data structures, which can be easily ported. These data structures contain so much information that the machine-dependent code does not have to correctly reflect the machine's state always, but can usually be reconstructed from the machine-independent structures.
4. Evaluation
The evaluation in this paper is rather weak. The only evaluation given is of very small benchmark programs, none of which is something that any normal user would run. (They also give very little explanation for these measurements). For portability, they only port their systems to physical machines that already can run Unix, showing no improvement over that operating system. One could claim that the time to build ports is smaller, but they give only anecdotal evidence for this, and no code sizes for the machine-dependent memory management code.
5. Confusion
My primary confusion is how the program handles page faults. Also, I wonder how Mach has gotten a reputation for slowness, given that the measurements in this paper do show that it was faster than BSD Unix.

Posted by: Stephen N. Lee | February 11, 2016 08:20 AM

1. Summary
The paper presents Mach, which provides an efficient machine-independent virtual memory management system that is easily portable on uniprocessor and multiprocessor machines by separation of machine-dependent and machine-independent components and assuming minimal support from hardware.

2. Problem
Portability of operating system suffered with the increase in the number of memory architectures as memory management code in the OS is tightly coupled with the underlying hardware memory architecture. The underlying hardware may support an inverted page table, a software filled TLB, multi-level page table, etc and the memory management code needs to be rewritten for each of these different architectures as the internal data structures in the memory management code were based on the underlying hardware support.

3. Contributions
The first major contribution is to build a portable OS which provides a machine-independent virtual memory management system by making least assumptions about the underlying hardware support and by splitting the memory management data structures into machine-independent data structures and machine-dependent data structures that is small and easy to change. Machine independent data structures include i) a system-wide resident page table that keeps track of information about machine independent pages, ii) memory objects, which is a repository of data, indexed by byte and iii) per-task address map which contains a sorted linked list of address map entries each of which provides a mapping from a range of virtual addresses to a region of memory object. All the machine dependent code and data structures are in a file called pmap.c. Mach treats primary memory as a cache for the contents of virtual memory objects and by using byte offset in memory object, dependency on hardware for page size is removed. Hence Mach can support a 4K page on a hardware that supports only 1K pages.

Another major contribution is building an efficient system while trying to provide a general solution that works for all types of hardware. Mach being a microkernel, relies on message passing but didn’t suffer from the performance disadvantages of previous message passing systems by making virtual memory management tightly integrated with message passing. By employing COW techniques for message passing large amount of data can be sent in a single message with just a simple memory remapping.

4. Evaluation
The paper provides an absolute time for porting Mach to different architectures but it is difficult to understand to what extent it eased the development effort when compared to a tightly coupled design. The paper evaluates the performance of basic memory management operations for Mach and Unix on different architectures. The time to read a file is measured only on VAX 8200 and I couldn’t understand what exactly was measured here and why this wasn’t evaluated for other architectures. It would have been useful to evaluate the performance of long-running workloads to better demonstrate if there is any performance overhead due to having multiple data structures. A TLB miss or a page fault in this scenario would access more memory structures and it is quite difficult to understand the effect on performance with these short workloads. Also, it would have been nice if there was a comparison of memory overhead against Unix as Mach maintains a number of different memory data structures.

5. Confusion
What is the benefit of having a user-level pager? Wouldn’t having the pager outside of kernel hurt performance?

Posted by: Aishwarya Ganesan | February 11, 2016 07:59 AM

1. Summary
This paper describes the design and implementation of virtual memory management within CMU’s Mach OS. The main goal of the design was to ensure ease of portability. This is achieved via the separation of the hardware-dependent portions of memory management from the independent portions. The authors proceed to describe the various abstractions, data structures as well as pros and cons related to different hardware memory management approaches.

2. Problem
Prior to this, the task of OS portability to different underlying hardware architectures was not a trivial one due to presence of a large number of memory management data structures. UNIX achieved portability was achieved via inefficient means such as basing the new implementation on previous versions or simulating the previously supported architecture. The authors propose a new design that would solve this issue.

3. Contribution
The key aspect of the proposed design was to separate the machine-dependent portion of memory management from the machine-independent portion. The authors encapsulate the machine-dependent portion into a single module. This separation is achieved via the introduction of new abstractions such as tasks, threads, ports, messages and memory objects. The various data structures were also designed keeping in the mind the end goal of separating machine-dependent code into one module to aid portability. The resident page table is used to track information related to machine independent pages. The address map is used to map a range of contiguous virtual addresses to a region of memory objects. The pmap is used for the machine-dependent mapping. Another aspect that I found interesting was the fact that Mach supported sharing via COW as well as read/write sharing. The way COW is handled via the use of shadow objects ensures that memory is used judiciously. Through this paper, a number of new ideas have been advocated indirectly. Firstly, the proposed design could support distributed systems via the use of messages as a means of communication. Secondly, the proposed design advocates the idea of a microkernel kernel by pushing the paging functions to user space.

4. Evaluation
The authors make a lot of claims but fail to prove the same via a sound evaluation. They prove that their design is easily portable by stating that it took a novice only 3 weeks. I feel that a more systematic and comprehensive approach should have been used to prove this claim such as comparing the portability of their system with the portability of UNIX. In order to prove that the separation of the hardware-dependent portions from the hardware-independent portions is achieved without sacrificing system performance, they present the results obtained by comparing the performance of simple functions such as fork and read on the proposed system and UNIX. However, they fail to use a wide-range of benchmarks that would reflect the applications intended to run on the system. The authors also fail to evaluate the linked-list implementation of the address map with other possible implementations. It would have also made sense to carry out analysis on the shadow object mechanism.

5. Confusion
It would be great if the entire workflow could be discussed in the class. Also, what exactly is a Mach Physical Page?

Posted by: Arjun Singhvi | February 11, 2016 07:34 AM

Summary

This paper describes the design & implementation of virtual memory management system in CMU Mach Operating System. It tries to build a case that by separating - machine dependent and machine independent modules it facilitates portability, makes possible to examine the pros and cons of hardware memory management schemes without compromising the system performance.

Problem

Operating system portability is an issue which continues to suffer with the rapid increase in the memory structures. The memory abstractions are heavily tied up with underlying hardware architectures. Unix attempts to address the problem by restricting the functionality provided by the VMM, keeping it simple enough and basing implementations for newer memory architectures on previous versions so that porting becomes easier. There arises a need to design a memort management system that can be easily ported and is independent of the hardware support.

Contribution

The main goal of the Mach is to explore the relationship between hardware and software memory architectures and separating VMM into two parts - machine independent module and machine dependent module has been its primary contribution. All the important information needed for the management of Mach virtual memory is maintained in machine independent data structures and machine dependent data structures contain only the necessary mappings needed for the current architecture.
Another contribution of the paper is integration of VMM with message driven communication. A single large file or whole address space can be sent via a single message with the efficiency of simple memory remapping. It gives the user application a flexibility to page-in, page-out, pinning on the memory objects it creates. Incorporating message passing in the virtual memory system enables scalability of the system on multiprocessors and network systems. Sharing in the VMM is done efficiently using CoW & shadow objects and allows sharing done using shared memory is implemented through inheritence levels.
Overall Mach limits the hardware dependency, only making sure that it can handle and recover from page faults and addressing limitations.

Evaluation

The authors present that the Mach can be easily ported and they provide an estimate to port to different architectures. While IBM RT time estimate was 3 week for an inexperienced programmer and on Sequent system 5 weeks for an expert. They have pointed out the issues with various uniprocessor and multiprocessor systems. VAX has large page tables because of its small page size, IBM RT PS allows only one valid mapping of a physical page creating issues for sharing, SUN3 results in sparse page tables. The authors also evaluated performance by comparing time for operations read a file, fork, however the explaination is not adequately presented for the reason. There are not enough benchmarking experiments to effectively justify the proposed theory and implementation.

Confusion
The concept of pager and handling of page fault is not so clear to me, also if the explanation for shadow page scheme could be elaborated.

Posted by: Ankur Srivastava | February 11, 2016 05:53 AM

Summary: By dividing the memory management model of the Mach system into several distinct concerns, the authors reduce the hardware specific memory management code to a single file, while simultaneously presenting a hardware-independent memory model that is much more flexible than that of classical Unix variants.

Problem: In the late 1980s, the hardware industry had not yet settled into the de-facto standards of x86 and ARM, and there were a multitude of viable targets for OS development. In this environment, there was a multitude of memory management architectures, from VAX-style paging, to segmentation, to inverted page tables. This poses difficulties for OS developers - porting to new hardware required significant redesign, which potentially cut across the entire memory abstraction.

Contributions:In more typical Unix-like operating systems, the act of writing the contents of physical memory to persistent storage is treated as an implementation detail to be disguised by the abstraction of a virtual address space, with the exception of special case like memory mapped I/O. However, Mach inverts this thinking, and considers the fundamental physical unit of an address space to be a chunk of persistent storage, called a "memory object" which can be cached in memory. In keeping with Mach's microkernel architecture, the logic for writing the memory objects to and from the persistent storage is handled by pager tasks, which can be user-mode tasks providing different types of functionality, though a default kernel-mode pager is also provided. This enables a powerful decoupling of the standard memory-management concepts into several orthogonal concerns. Mach's virtual memory abstraction simply maps virtual addresses to offsets within memory objects; the algorithm for writing a specific memory object in and out is encapsulated separately in a pager. Independently of this, the kernel's hardware specific logic handles both the management of the paging/segmentation hardware, along with the RAM allocation and eviction policies.

This 'memory object' model also has the convenient effect of cleanly representing memory sharing policies via permissions attached to the objects. For example, a parent can flag a memory object to be inherited as "copy on write" by a child process, via permission bits on the underlying memory object. Then, if the child modifies the object, a "shadow" memory object is simply created to represent the changes to the original version.

Evaluation: The authors investigate whether the new memory abstractions ease development by porting Mach's memory management to a variety of architectures with divergent memory models, with compelling results. In each case, the port took roughly one to two months, even for an inexperienced developer.

The performance analysis is comparatively disappointing, with the authors testing Mach on some very basic memory and file read/write benchmarks, along with some generic compilation workloads. For the compilation workloads, there's no clear indication of how much the paging mechanisms were stressed, and the authors don't bother to analyze the memory access characteristics of the compilation workloads in any way at all, so it's not clear what a reader is supposed to infer from the results. Moreover, experimental procedure for each compilation workload is described in such a cursory manner that an exact reproduction of a given experiment is impossible (i.e. what are the other thirteen programs that are compiled?), rendering the individual performance claims unfalsifiable.

Confusion: With their linear access times, It's not clear to me why they chose so many sorted linked lists to store address space data. If I were managing sparse address spaces, I'd definitely consider an indirect table of some sort before I'd ever try a linked list.

Posted by: Michael Vaughn | February 11, 2016 04:28 AM

1. summary
Separate virtual memory management from hardware without sacrificing performance which will help in portability of OS on a different hardware.

2. Problem
Memory management relies on hardware support. so authors goal is to write
an OS to run on different type of hardware with limited assumption on underlying hardware

3. Contributions

The main contributions are:

1] Separation of hardware independent and hardware dependent modules for Virtual memory management which makes portability easy.
2]communication via messaging which allows passing an address space and just remapping of address.
3]Efficient implementation of sharing of memory objects by introducing an extra level of indirection . Using of Shadow objects for COW sharing and sharing map to keep track of Shared memory objects.
4] Using a separate hardware dependent pmap module which allows incorporating of new hardware easy.
5] Small changes required to incorporate HW support for VMM such as TLB.
6] allows incorporating of sparse address space using linked lists in address maps.
7] Software managed TLB. User knows best about what mechanism to use for that task to efficiently manage paging with relatively low effort.
8] All Virtual memory information can be reconstructed using machine independent data structures .which helps in lazy evaluation of invalidating virtual to physical mapping

4. Evaluation
The Mach approach is compared to unix on basic operations such as zero fill page ,
fork, reading a page. The proper evaluation would be check against
proper workload with stress on memory handling. Even after using so many Memory
management data structures the performance was still upto the mark and
there was no additional overhead.

Using Reference count for garbage collections.
How reference count is handled properly in case of multiprocessor systems?
Complex locking rules because of multiple CPU accessing the virtual memory
in parallel makes garbage collection tough. All these are not evaluated.

Other problems such as lazy evaluation for invalidation in map can cause problems
where there are heavy forking process running and there is have paging,
lazy evaluation may cause wastage of memory. The problem may be further compounded
because we need to manage the object tree management ( Garbage collection)

5. Confusion
is Memory object per process or many memory
object can be associated to a process. How is handling of ports is done. Since there
can me many ports associated to many memory objects. How are they efficiently managed.More clarification on what paging deamon does.

Posted by: Mushahid Alam | February 11, 2016 04:11 AM

1. Summary
-This paper describes the design and implementation of VM management within the Mach OS. The aim was to separate software memory management from hardware support without compromising on performance. This is claimed to be possible with machine independent data structures managing the entire virtual memory. The authors claim to have ported the system across various hardware architectures and discuss the pros and cons of each variation.

2. Problem
-One problem with the advent of various hardware and instruction set architectures was the inability of the OSs to cope as easily as software to this variation, OSs continued to suffer from a proliferation of memory structures. Mach’s approach to solve the problem was based on modular and separate machine dependent and independent(single code module and a header file) for virtual memory management, extended inter process communication that enhanced portability to multiprocessor systems and compatibility with traditional uniprocessors.

3. Contributions
-Mach defines 5 abstractions task, thread, port, message and memory object.
It supports large, sparse virtual address spaces, copy-on-write and read-write sharing via shadow paging, memory mapped files and user-provided backing store objects and pagers. I attribute the key to efficiency in Mach is its integration of virtual memory management with a message oriented communication(allows whole files, large address spaces to be sent, memory remapping). COW and read/write sharing are ensured by inheritance values(shared, copy or none) and protection(current and maximum) specified per-page basis. A child inherits the values of its parent.
-Address map, maps the virtual pages in resident page table to a memory object which is then mapped to a physical page by pmap(machine dependent data structure containing data crucial only to system execution). I also appreciate the emphasis Mach places on garbage collecting the non-important mapping information, unreferenced shadow entries in the chain of shadow pages and unreferenced pages(decided by user specified domain knowledge of a pages’ usage and relevance in cache) to save space and time.
-I like the fact that the implementers of Mach have put much thought into the implementation of machine independent data structures– doubly linked list of address map, all associated pages linked in a memory object, fast lookup table by using hints, object/bucket hash offset thereby resulting in speedup of memory management operations namely, page fault lookups, allocation/de-allocation of address ranges, page-out and page-in by pager (mostly in user task space). The authors also present the signatures of APIs implemented in Mach to support the various abstractions which I feel helps readers to gain a more clear understanding of these abstractions.

4. Evaluation
-The authors claim to have evaluated the Mach across various hardware and also say that the effort to port Mach was primarily concentrated on the aspect of debugging compilers, device drivers and validation/invalidation of hardware address translation buffers specific to the underlying hardware, but the time spent in implementing the pmap module was comparatively simpler(3-5 weeks).
-Mach was ported to the following uniprocessor architectures, VAX(page tables were page-able within the kernel virtual address space), IBM RT PC(inverted page tables), SUN 3(combination of segments and page tables). The authors say that Mach was able to address the issues limited process addressability, only one valid mapping for each physical page making sharing impossible and “holes” in physical address space found respectively in the above mentioned architectures.
-When it comes to Mach’s portability over multiprocessors the authors begin with an honest statement saying that it is impossible to reference or modify a TLB on a remote CPU on any of the multiprocessors running Mach as Mach does not support TLB consistency over multiprocessors. They go ahead and attempt to workaround the problem by proposing three solutions: forcibly interrupt CPU(when time is critical), postpone use of changed mapping until all CPUs have taken timer interrupt or allow temporary inconsistency. Propsoed strategies to deploy Mach over tightly/loosely coupled systems are fully shared memory with uniform access times, shared memory with non-uniform access times and message-based (using “tricks” that allow lazy evaluation of by-value data) and non-shared systems.
-I appreciate the effort the authors have put into analyzing the exhaustive list of hardware scenarios and system infrastructures and proposing strategies as to how Mach could be ported to each of these. However having said that I feel from an empirical evaluation perspective there is very less adequate information of extensive testing using real world loads (opposed to simple fork, read and write operation) performance on Mach. I also notice that the effect of specific abstractions (ports, pmaps, memory objects, extra level of indirection) have not been tested and this renders the evaluation strategy on a whole as vague.

5. Confusion
-A diagrammatic representation of the various machine dependent and independent data structures and their interactions would be helpful.

Posted by: Shruthi Racha | February 11, 2016 03:54 AM

1. Summary
This paper describes the design and implementation of the virtual memory management of CMU’s Mach operating system, which aims to be one that achieves performance comparable or even better than UNIX, while having better portability on different hardware memory architectures by decoupling machine dependent and independent code modules in the virtual memory management process.
2. Problem
One of the challenges about virtual memory management facing operating systems is the wide variety of memory structure required by hardware. In UNIX, code on virtual memory management is closely affected by underlying hardware memory architecture, thus the portability issue was solved by restricting facilities provided by hardware, which may limit opportunities of optimization on both hardware and software level.
3. Contributions
The Mach operating system is designed to explore the possibilities to decouple the virtual memory management system into machine dependent and independent code, where dependent code can be a well-defined code module without much knowledge of independent code to improve portability, where independent code can provide more rich mechanisms for inter-process communication and various optimization techniques including copy-on-write. Specifically in their implementation,
i. The machine dependent part of their virtual memory management system includes only a single module called pmap, which does the actual mapping to physical pages on the hardware. By interleaving this code module with independent code which maintains mapping information in its own way, pmap does not have to synchronize with the independent code data structures all the time for the benefit of time and space efficiency depending on specific hardware, and can delay such operations until an actual fault occurs.
ii. Mach’s page differs from the physical page defined by the hardware and belongs to memory objects, which are collection of data backed up on disk and can be mapped to virtual addresses of tasks. Information about those mappings are kept in resident page tables and the address map, so on page fault the operating system can construct the correct mapping.
iii. Mach uses copy-on-write on forked tasks, having them mapped to the same memory object. When certain page is modified by a task, a shadow object is created to remember the modified pages of that task while linking to the original object for referencing other unmodified pages. Chains of shadow objects can be avoided by checking redundant intermediate shadow objects when the modified pages have been completely overlapped.
4. Evaluation
Adapting Mach’s memory management system to different hardware architectures only involves changes to a single code module, pmap, and this process is shown to be quite lightweight according to their experience of porting Mach from VAX to IBM RT PC.

As for performance of Mach, it is compared with UNIX on various common memory and I/O operations and Mach seems to win in most cases, but the reasons for those results were not specifically analyzed, although some of them could be speculated as result of optimization techniques including tricks of memory mapping in pmap in zero-fill operation, copy-on-write in fork, and persistent caching of memory object data in reading file the second time. For the performance result in the compilation comparisons, however, with two versions of buffer configuration, we might expect to see the same trend for both Mach and UNIX, while we see Mach’s performance degrade with limited 400 buffer, when UNIX’s performance increased by a large factor. Again, it would be really helpful if interpretation of those data would be included in the paper.

So far memory operations tested in the paper mostly include sequential memory allocation and access pattern, however in cases where data structures like linked list and tree are heavily used, the performance comparison could be different. Mach’s lazy evaluation on page mapping relies on page faults, and virtual memory mapping needs to be handled in both machine dependent and independent code, so it is useful to see those designs may incur overhead under random memory allocation and access patterns.

5. Confusion
The paper mentions that “throwing away” some virtual to physical mappings may improve speed efficiency. How does it actually happen?

Posted by: Fujie Zhan | February 11, 2016 03:04 AM

1. Summary
This paper discusses an effort into the separation of software memory management from hardware support. In order to detail their work they describe the virtual memory management in the Mach OS. It was aimed to be easily portable to multiple types of hardware. The design supports large, sparse virtual addresses, copy-on-write, copy-on-write and read-write memory sharing between tasks, memory mapped files, and user-provided backing store and pagers.

2. Problem
Porting OS on multiple hardware is a difficult and time consuming process. Furthermore, each new hardware device provides a slightly different interface for memory interaction (page size, segmentation, page tables etc) and different opportunities for optimization. Unix tries to avoid this issue by developing a simple memory management system and restrict the functionality provided by the VMM so that porting becomes easier.

3. Contribution

The main contribution of this paper was the design of an machine independent virtual memory management system. They defined abstractions intended towards this goal and describe their functioning. Data structures in Mach OS virtual memory management are machine independent. Machine dependent structures like linear page table in VAX, inverted page table in IBM RT PC and segments in SUN 3 is handled by machine dependent code. In order to ease portability the interface between the machine independent part and the machine dependent part was standardized. Mach also supported sparse virtual address space which allowed “processes” to access larger virtual address space region than allowed in a traditional UNIX based system. Mach integrates memory management with message-based communication. However, they do not delve deeper into the implications of this.

4. Evaluation
1. The main goal of this paper was to prove the ease of portability of the Mach OS. However, I feel that their evaluation of portability has not been carried out in a convincing manner. They do not discuss the quality of the final product after the portability nor compare that with any attempt by the same programmer to port a unix system.
2. The performance evaluation is quite simplistic. The paper presents timing results from running fork, reading a file and compiling Mach with few other C programs. A more memory intensive application and the corresponding impact on performance would have helped to appreciate the design more.
3. An analysis of the results was also missing.
4. Authors claim that integration of virtual management with message oriented communication facility leads to an efficient message based system. A more systematic evaluation would have helped.

5. Questions
1. Could we go over the motivation of this paper in more detail?
2. Even the details of the abstractions are not clear and it would be great if we could go over the need for them/ compare them to traditional UNIX system in the class.

Posted by: Urmish Thakker | February 11, 2016 02:50 AM

1. Summary: This paper is about the memory management scheme adopted by Mach microkernel. The authors claim that this memory management scheme is more portable, since they have modularized the memory management, and brought down the machine-dependent code to just one file. They claim that this design performs equally well as Unix systems.
2. Problem: The huge effort in porting OSs impedes new hardware development. Unless, the hardware is widely ‘accepted’, an effort to port the OS for this hardware is not made, and unless there is an OS to manage the hardware, the hardware is not widely accepted. This is a vicious cycle. The authors in this paper specifically target the memory management section of OSs, and make it machine-independent by handling most of the memory management in software. By doing this, they not only facilitate easy porting, but also provide an opportunity to evaluate a range of hardware memory management schemes. The aim is to do all of this without performance loss.
3. Contribution: Mach was one of the first microkernels, and is still the core of popular OSs like OSX, iOS and GNU Hurd. As such, they show that key concepts like page-in and pageout can be moved out of the kernel, making it smaller. They do this by realizing an efficient message-based communication. One of the contributions of the paper was that they viewed physical memory as a cache for virtual memory, a view quite prevalent today. The authors of the paper also come up with a solution, that allows OS to only keep only the currently used parts of page table in memory. All of this led the authors to clearly divide the contemporary memory management mechanism into software-managed, and hardware-dependent. They abstracted the hardware-dependent part to provide a uniform view for all other operations, in effect making memory management machine-independent. Thus, in their implementation, the OS’s handling of pages need not depend on the page size, a hardware feature. It also need not depend on hardware features like a page-table base register, or segmented-paging support. All of this is handled in just one machine-specific structure, called pmap. The process’s virtual address space now consists of memory objects, which may contain multiple pages. There are address maps, which is a sorted linked entries containing virtual to memory object mappings. There is a system-wide physically indexed page table. By the virtue of their efficient message-based communication, they are also able to move page fault handlers to user space.
4. Evaluation: Although, the authors have made claims that they have isolated the machine dependent code to a single module, they have not explained this module in the paper. They also do not justify the easy portability of their system. They just quote a rough estimate of the time taken by a novice, and do not quantify or compare it with that of porting other systems. The performance numbers quoted are also without any explanation for why and how such workloads were run. In one of the tables, they compare the compilation times of kernels, although it is not clear why these numbers would matter. Instead measurements like bootup time, or time to run some standard programs would have been much better. They also do not evaluate the performance losses due to shadow chaining. There must have been an increase in memory pressure due to using expensive structures like memory objects, and a decrease due to address maps. Amidst this, an evaluation of memory usage would also have been useful, and is missing. They also do not explain the apparent improvement in the performance they measured. In short, they were very weak in evaluating their design.
5. Confusion: What exactly are the memory-dependent virtual memory codes that prompted the authors to make this radical design change? How do they manage to efficiently use communication, while other systems are slow when using it? How do the four memory structures integrate? What exactly is pmap structure? Why is there a pager per memory object? A flow of events, showing conversion of virtual to physical address would be extremely useful!

Posted by: Mohit | February 11, 2016 02:49 AM

Summary:
The paper describes the virtual memory management of the Mach kernel. The MMU in the Mach kernel is highly portable and efficient in that it utilizes the underlying memory management hardware by separating memory management into machine dependent and machine independent portions.

Problem:
The paper tries to address the problem of inefficiency in the memory management of the then existing operating systems in dealing with different memory management architectures.

Contributions:
The Mach MMU implements virtual memory management by segregating it into machine dependent and machine independent, where the machine dependent code is implemented in a single module.
This form of virtual memory management is integrated with a message passing implementation of the kernel that abstracts finer programming details such as placement of data in the network.
The Mach MMU permits some system services and features such as paging to be managed by user-tasks. This is the basic idea of an Exokernel.
The Mach MMU implements CoW and Read-Write for machine-independent pages sharing amongst tasks.
The Mach MMU allows large, sparse virtual address spaces. It manages the page table size in memory by constructing only the parts required for mapping

Evaluation:
The evaluation of the paper although indicative is not comprehensive. Mach Virtual Memory (VM) has been compared with that of the traditional Unix system. This seems to be an unfair comparison for performance. If the traditional UNIX system had incorporated some of the machine independent improvements on Mach, it would have probably performed similarly.
However, demonstrating that operations take almost the same time on different platforms demonstrates the portability of the Mach VM. The comparison of the compilation times of different programs on different platforms seems to be vague and does not directly imply anything.

Confusions:
Concept of a memory object is unclear.
How resident page table is used and where it is used.
Address maps and shadow maps

Posted by: Prashanth Balasubramanian | February 11, 2016 02:48 AM

1. Summary
This paper focusses on the memory management subsystem of the Mach Operating System. The most important characteristic of this is the hardware independence of the VM logic. There is only one module to handle the machine relative details. This allows for reduced porting effort for new architectures as well an objective evaluation of the various memory management features present in each architecture.
2. Problem
The lack of an O.S. with a hardware agnostic virtual memory interface lead to either vendors writing their own Operating Systems which leads to a lack of portability and standardization. Other Operating Systems that supported various hardware architectures would either restrict the facilities to the lowest common denominator or would internally simulate the memory mapping architecture on which they were originally designed.
3. Contribution
The primary contribution of this paper was the separation between the features provided by the memory management subsystem and the underlying hardware. The paper clearly defines the API that needed to be implemented by the pmap(machine dependent) module. Over this the paper builds virtual memory by using address maps to memory objects. The objects themselves follow object oriented semantics and can be shared in both Copy On Write(COW) and read/write enabled modes. The COW implementation includes Shadow Objects which allow for only the modified sections of the objects to be copied when one of the sharing tasks writes to the shared region. These objects only contain mappings to the modified pages and refer to the original object for the rest of the mappings. We can see similar constructs in modern linux/unix where a forked child process shares most of the memory with its parent and only maintains a copy of the data that it modifies. The paper succeeds in integrating memory management and communication by using ports which can be used to send actionable messages between pagers. Pagers are user level memory management modules that can independently handle page faults and page out requests without involving the kernel. At each of these steps the paper lists the interface between two layers and the functions that should be implemented by both. This ensures a clear separation of concern and allows applications to implement/call as much functionality as they require. However, they leave many points not discussed at sufficient depth such as how ports are addressed system wide. Mach may also silently fail if the underlying hardware does not support any of the richer features such as memory protection where the paper states that the enforcement of access permissions depend on hardware support.
4. Evaluation
This paper evaluates Mach versus UNIX on basic memory intensive tasks such as zero filling a page and forking a process. They do this with the intention of proving that the loose coupling does not have adverse performance impact. The paper only compares the compile times of various programs as a real world use case. The paper does not sufficiently convince the reader of it performance as it does not include complicated use cases that would stress the new abstractions introduced and check for lock contention etc. in the new code. Otherwise a piecemeal analysis of the performance of each layer for eg. time taken to send a message over port vs UNIX IPC mechanics or time taken to set up shadow objects vs the time taken to set up a new page table would have instilled more confidence to the authors claim that Mach does not perform worse than UNIX in most scenarios.
6. Confusion
Where are all the virtual to physical mappings stored as the paper mentioned that the pmap modules only hold the mappings that are currently in use?
A flow chart explaining all the abstractions layers as some such as resident page tables still confuse me.

Posted by: Abhinav Mehra | February 11, 2016 02:37 AM

Summary
The paper presents the design and implementation of the virtual memory management of Mach operating system that separates the software memory management from hardware support without sacrificing performance.
The problem
OS portability suffers from a proliferation of memory structures resulting in limited virtual memory management other than paging support. Addressing this issue, Mach introduces a memory management unit that can be readily ported to a multiprocessor computing engine as well as a traditional uniprocessor.
Contributions
1.Mach divides into two modules-one is machine dependent while the other one is machine independent.
2.Mach's internal memory makes very few assumptions about the available memory management hardware. It is in fact the first machine-independent portable virtual memory design. All hardware dependencies are defined in a single abstraction- Mach captures the machine dependent portion of the virtual memory in a single code module and its related header file.
3.Mach supports user level paging handlers- page fault and page-out requests are accessed outside the kernel. This is a novel concept introduced in this paper.
4.Mach integrates virtual memory management with a message-oriented communication facility. This integration allows large amounts of data including files and even whole address spaces to be sent in a single message with the efficiency of simple memory remapping. Operations on the various objects are performed by sending messages to ports. This provides an indirection which allows objects to be arbitrarily placed in the network. Mach also supports distributed paging and sharing across networks.
5.By isolating architecture role to support of Mach-defined interface, the system can provide a relatively unbiased assessment of hardware design alternatives.

Evaluation
The paper says that porting Mach is a relatively easy task which an be accomplished by someone without prior exposure to C or operating systems. However I feel the performance evaluation is somewhat insufficient. Only zero fill, fork, and file reads have been tested. No standard benchmark has been used for the analysis, neither is any explanation given as to why Mach performs better than 4.3bsd. Also according to me the shadow chains could result in significant overheads hence a performance analysis of COW and sharing would have been good addition.
Confusion
What are the actual hardware dependences of a virtual memory management unit in general?
Why are operations of shared memory region logically address map operations?

Posted by: Amrita Roy Chowdhury | February 11, 2016 02:21 AM

Summary:
This paper describes the virtual memory management system in the Mach operating system. It explains the design of the system including the abstractions, the machine dependent and independent data structures and also their implementation details (APIs for paging and virtual memory operations). The main idea is that this virtual memory management system is more portable to traditional uniprocessors and multiprocessors with no loss in performance, due to the presence of a minimal machine dependent code module and hence requires minimal hardware support.

Problem:
Due to the emergence of a variety of processors with varied architectures, designing operating systems specific to each with variation in the virtual memory management system design would lead to over complexity, lots of code changes and hence higher cost.

Contributions:
The paper primarily describes about the idea of separating the hardware dependent and hardware independent virtual memory management modules, while trying to keep the hardware dependent module minimal to enhance portability across various hardware architectures. In a way it also helps fully utilize the features of the hardware but at the same time maintain a highly optimized and portable hardware independent VM management system.

Another concept that’s different from the existing Oss is the extensive use of message passing for communication, which is certainly performant by passing addresses instead of passing large amounts of data. This has been explored before as well, and in this paper message passing has been well integrated into the VM Management.

For implementing the memory management system, abstractions such as task, message, threads have been defined and data structures such as the address map, resident page table, memory object and pmap have been described.

Facts: The backing store is the memory object, which has a pager and port for access. The port and pager abstraction sort of acts as a point of entry and appears similar to the file system APIs. The paper also describes how page faults are handled by the kernel pager, copy on write creates shadow objects for modified pages and at the same time continues to reuse mappings for shared pages and also the removal of unnecessary shadow objects during garbage collection.

So the paper in effect tries to enhance sharing of memory objects, but at the same time tries to reduce the memory usage of the mapping data structures (shared maps).

Evaluation:
Author mentions that porting a Mach to VAX took just 3 weeks for an inexperienced programmer to prove portability. To test the portability of the system, the changes in the hardware dependent module in four of the existing hardware architectures such as IBM RT PC, SUN 3, VAX and MMU are examined, which I think was the best way to prove the relevance of such an idea, instead of assuming custom hardware configurations. Performance is also evaluated against Unix and 4.3 BSD for various system calls, and since the performance is comparable. But, the paper does not evaluate the performance of message passing, copy on write on excessive forking, RT PC/VAX/SUN 3 after making the changes to the hardware dependent module.

Confusion/Issues:
What’s a shared map and how is it used and what if there are too many writes to shared pages? What’s the advantage of having the external pager as a user task? Since there are ports, are there capabilities also used in this system?

Posted by: Siddharth Suresh | February 11, 2016 02:21 AM

1. Summary
The authors of Mach aimed to implement a virtual memory management system that has minimal information on the underlying hardware and separates out memory management from hardware support. The system uses memory independent data structures with minimal machine dependent code. This design does not tax the performance and also manages to leverage the basic hardware functionality.

2. Problem
The Mach potable, multiprocessor operating system aims to solve the traditional problem of memory management in OS being closely tied to the hardware architecture. So for instance, UNIX would restrict the facilities when ported on a new instruction set architecture and falls back on simple pagin support.

3. Contributions
The main objective of developing Mach was to provide a base for other operating systems to be built. It supported large sparse address spaces and also allowed transparent access to network resources using ports. Mach has a powerful, elaborate and highly flexible memory management system based on paging. The code of Mach’s memory management is split into three parts. The first part is the pmap module, which runs in the kernel and is concerned with managing the MMU. The second part, the machine-independent kernel code, is concerned with processing page faults, managing address maps, and replacing pages. The third part of the memory management code runs as a user process called a memory manager. It handles the logical part of the memory management system, primarily management of the backing store (disk). The conceptual model of memory that Mach user processes see is a large, linear virtual address space. The address space is supported by paging.
A key concept relating to the use of virtual address space is the memory object. Each memory object that is mapped in a task's address space must have an external memory manager that controls it. The goal is to have a single, linear, virtual address space that is shared among processes running on computers that do not have any physical shared memory. When a thread references a page that it does not have, it causes a page fault. Eventually, the page is located and shipped to the faulting machine, where it is installed so that the thread can continue executing. Byte-offsets were a good optimization for handling page faults faster. I really like the domain-specific knowledge used from the application layer hints to improve performance of paging.

4. Evaluations
Mach provides a platform to evaluate various hardware MMU for multiprocessors. Shadow object concept has been introduced here but it lacks the back-up of how much memory was saved by adopting this design. For a complex concept as this one, having basic benchmarking programs such as read, write, fork and zero fill does not justify the claims of no sacrifice in performance. It would have been good if the kernel's scheduler was stressed with heavy workload resulting in user-driven page faults and pageouts testing. That would give the idea of how this is a better design than the traditional kernel-driven pageouts. Also the disadvantages of copy-on-write is that it requires more kernel traps, one for each page that is written and it doesn't work over a network. Another missing entity is the evaluation of COW vs no COW to justify the performance gain.

5. Questions
A diagrammatic representation of memory object, message, tasks and address maps would've definitely helped. I want to understand why each data structure was non-overlapping esp sharing map.

Posted by: Sejal Chauhan | February 11, 2016 02:10 AM

1.Summary:
This paper focuses on design and implementation of virtual memory management in the Mach Operating System which makes it portable on different hardware architectures where the machine dependent code is a single code module. Message passing communication is used for memory management, providing efficient memory remapping and sharing.

2.Problem:
Lack of operating system that was agnostic about the underlying hardware's memory structures, which led to hardware vendors writing their own operating systems, thus affecting portability.

3.Contributions:
1) Separation of management of Mach virtual memory as part of machine dependent code and virtual to physical address mapping as part of machine dependent code, without impacting performance.
2) The authors have introduced various data structures such as address map, memory object, port, messages and use message passing for memory sharing/remapping that can be done by sending even an entire address space as a message. This can also be extended to a distributed system setup.
3) Handling of page faults within the user space and memory mapped files with the help of 'pager' structures.
3) Memory sharing techniques by introducing shadow memory objects for effective copy-on-write and level of indirection by using sharing map for read/write sharing.
4) pmap module for valid physical address mappings.

4.Evaluations:
The authors have evaluated how good Mach's machine independent memory management performs on different hardware structures for VM operations such as forking, reading files, compiling in comparison to traditional UNIX. They have not evaluated the complexity of message based communication for managing memory and performance impact of having machine independent and dependent data structures in order to support their claims.

5.Confusion:
Difficulty in visualizing how the various memory structures are used for address translations?
Is sharing map structure a necessity for read/write sharing?
Working of external pagers.
What are Mach physical pages?

Posted by: Sharanya Devaraj | February 11, 2016 02:09 AM

1. Summary

The paper summarises the design of a memory management system that is readily portable to distinct uniprocessor and multiprocessor computing engines.

2. Problem

The operating system consisted of a large number of distinct memory structure that were dependent on the underlying hardware. Due to this, the virtual memory management in the operating system is highly limited. Hence, in order to address this problem, the Mach OS was designed with a memory management system that would be portable across different architectures.

3. Contribution

The main contribution in this paper is the virtual memory sharing between tasks and defining all the hardware-dependencies in a single abstraction. All the information important to Mach virtual memory management is maintained in machine-independent data structures (resident page table, address map, memory object) and the machine dependent data structures (pmap) contain only those mappings necessary to run the programs. The resident page table, used by kernel to maintain state of physically resident pages, provides a fast lookup of physical page on page-fault. The protection values for the virtual memory pages is used to control access to shared memory between tasks. Its copy on write mechanism creates a shadow object containing only the modified page, thereby significantly preventing long chains of shadow objects. The pmap is the machine dependant memory mapping data structure and it implements only those operations necessary to manage hardware-required mapping data structures. The user-level pagers are very beneficial, customizable and allows for distributed VM managements. These external pagers allows for tuning the VM based on the application needs.

4. Evaluation

The author evaluates the performance of the system by measuring fine-grained operations like fork, read, etc system calls and some high-volume testing. The system achieved same or better performance when compared to UNIX. The memory management features like shadow chains and user-level page handlers have high complexity, however, the evaluation of the system does not involve load-bearing tests that evaluate the Mach paging thresholds. Mach has only been ported to multiprocessors with uniform shared memory.

5. Confusion

1. Can the cost involved in porting machine-dependent OS to different architecture be explained?

Posted by: Nivetha Singara Vadivelu | February 11, 2016 02:04 AM

Summary
The paper discusses Mach, an operating system that makes few assumptions about the underlying hardware and then explains how this design approach allows it to get portability benefits without sacrificing performance. The foundation of the Mach memory management model lies on using a message passing mechanism for communication between tasks, which is made efficient by using techniques such as memory remapping, copy-on-write sharing and read-write sharing whenever possible.

Problem
Existing Operating System designs suffered from a lack of portability across different architectures as they were tightly coupled with their intended target architecture. With the idea of OS portability gaining ground, it was necessary to develop an Operating System memory representation without patterning it to any specific architecture. Earlier attempts at effectively managing virtual memory (which influenced Mach), such as Accent, Multics, Apollo’s Aegis, IBM’s System /38, etc. were tied to a specific hardware, thus affecting their portability.

Contribution
The authors developed the Mach Operating System, which formed the basis on which certain popular Operating Systems such as OS X were built upon. Mach overcomes the portability issue mentioned above by separating the machine dependent and independent parts of the OS code. The hardware dependent part (the pmap module) supports and interacts with the hardware independent Mach portion through a known API. Mach relies on message passing though ports to achieve inter-task communication. It also allows sharing of memory objects (a unit of backing storage maintained by the kernel or user task) through copy-on-write and read-write sharing techniques implemented through data structures such as the address map, shadow objects and sharing maps. Supported Mach virtual memory operations include allocation, deallocation, setting protection and inheritance statuses and memory mapping /remapping. Mach also allows a user level 'pager' task to manage paging operations for a memory object, however, the motivation for allowing such a design was not discussed.

Evaluation
The authors attempt to evaluate Mach by first running it on multiple architectures such as the VAX, IBM RT PC, SUN 3, etc., thus proving that Mach can run on different architectures with OS changes limited to that of the machine dependent pmap module. They then compare the performance of Mach on these architectures with against the traditional UNIX system for certain programs and conclude that Mach produced equal or better results in most cases. Possible techniques/ workarounds to make Mach run on multi-processor architectures with varying memory organizations such as UMA, NUMA and distributed memory, have also been discussed.

I believe that this evaluation was lacking in many aspects. Specifically, the authors did not compare Mach performance against standard application benchmarks. Also, while running Mach on the different architectures, they started comparing them, which was tangential to the scope of this paper. The authors have also not studied the performance impact / overhead of message passing, which they have claimed to be better performing (than earlier message-passing based attempts), but no breakup of execution times was provided to shed light on the individual time taken by the independent module, machine dependent pmap module and the messaging passing overhead. The main objective of the paper, which was to make Mach a portable OS, also has not been objectively evaluated, say in terms of evaluation parameters such as developer effort and complexity required to port to a new architecture.

Questions / confusion
Address translation in the Mach independent module could have been better explained, possibly with a diagram. Particularly, how do the address map, memory object / offset hash-bucket, and the pmap fit together in the global view of address translation ?

Posted by: Shantanu Bhate | February 11, 2016 02:01 AM

Summary
The paper presents the virtual memory management techniques incorporated in Mach OS, which decouples software memory management from hardware to aid in ease of portability. This also imporves the eecution time and provides better/comparable performance to Unix.

Problems
Operating systems like Unix are inefficient to adapt to the hardware memory management changes and hence underutilize the hardware. Also, making large code changes to the kernel makes it difficult to port to newer hardwares. Thus various subsequent versions of the OS makes only incremental changes and are inefficient.

Contributions
The paper talks about virtual memory management design in Mach OS. This design decouples the VM management into machine dependent and independent structures. Machine dependent code represents the hardware memory management techniques(eg. paging/segment support for physical pages, TLB maintainance) and needs to be changed for different hardware. Machine independent code is responsible for all VM usage and the design ensures minimal interaction between machine dependent and independent data structures. The decoupling enables lots of features:

Page faults can be handled without invoking kernel. Mach design provides a monitoring task for each memory object called pager which can address page fault like reading a page from disk in a file system. It also enables communication between other tasks running on remote nodes and the kernel using simple message passing techniques.

Machine independent data structures like resident page table helps in consolidating VM information such that it enables faster page allocation and garbage collection. The object/offset hash bucket enables Mach to configure different page sizes independent of hardware support.

Even though copy-on-write copy is not a new technique contributed by this paper, its implementation differs significantly from other kernels. Here the shadow object tracks only those pages which are modified and it is possible to create a shadow object from another shadow object. Its garbage collection removes any intermediate duplicates. TLB flushing and writeback can be done lazily. Mach provides various inheritence levels to ease the specification of copy-on-write or read/write between two tasks.

Machine specific hardware mapping is restricted to just one file(pmap.c). This helps in porting the OS easily from one hardware to the other. Having a separate machine dependent structure helps in lazy evaluation virtual-physical translations (for example if a page is swapped out, it can be scheduled to happen at a later time in hardware).

Evaluation
The authors have evaluated most of their claims. To show that Mach OS is on par with unix, the paper presents timing results from running fork, reading a file and compiling Mach with 13 other C programs. It also shows that portability is easier with Mach, they present the details of porting it to VAX 11/784,11/780, MicroVAX and IBM RT PC. It has also presented various issues while porting it to uniprocessor and multiprocessor systems. Some of the missing evaluations include how Mach OS performs while communicating to remote nodes( it says large memory spaces can be passed in a single message), runtime memory footprint of mach OS compared to unix and the efficiency in running multithreaded programs for a long period of time( As they will have lots of sparse usage of memory pages and overhead due to complex garbage collection technique). The workloads selected for evaluating the design seems insufficient.

Confusions
Are any of these techniques being used in present day OSes and how successful was this design?
What is the use in supporting multiple page sizes? What kind of applications would use it?
How much does hardware memory management features change with each generation?

Posted by: Bhardwaj Krishnamurthy | February 11, 2016 01:55 AM

1. Summary
The paper presents the design of machine-independent virtual memory (VM) management in OS and discusses its implementation on CMU Mach. The main motivation behind the work was to enhance portability of operating systems across different hardwares. The central theme is to have generic machine independent VM data structures that do the bulk of memory management and a minimal amount of machine dependent code so that porting the OS to new hardware requires changes only in the machine dependent part which is relatively easy to modify.
2. Problem
The VM memory management in operating systems that existed till date was very strongly tied to the underlying hardware, which made the task of porting OS to different architectures a formidable one. Some of the operating systems tackled this by internally emulating an old already supported architecture, which at times could be less efficient. Further, operating systems at the time were not scalable to multiprocessors or distributed systems. These were the main issues that the authors wanted to solve with this work. Additionally, they claimed that the design helps study the pros and cons of various hardware VM management schemes.
3. Contributions
The main contribution of this paper was to design new abstractions that helped in making the memory management machine independent. They implemented new data structures which kept track of the virtual memory by having an intermediate level of machine independent pages. This moved most of the functionality to a machine independent part of OS, leaving little to be done by the hardware specific part of OS. The standardisation of interface between the two parts made it easy to port the OS to new architectures, as claimed by presenting the timeline for port to IBM RT PC. Other than portability, I believe that the work supported sparse virtual memory very efficiently by using linked list based data structures and by introducing the idea of memory objects, reducing the space requirements which in turn increased a process’ addressability. The design had user-mode pagers to manage the memory objects. Also the idea of having multiple page sizes could be beneficial to certain applications.
4. Evaluation
The paper does a very poor job of substantiating their claims using very limited experimental results. Firstly, using a rough timeline to highlight ease of porting Mach OS, they failed to quantify the effort well, for example by using metrics like bugs found after spending a specific amount of time on porting. No comparison was made against the effort required to port traditional operating systems. Secondly, to highlight performance impact of the implementation, they have used very naive micro-benchmarks and a single workload. They have not used any mainstream uniprocessor or multiprocessor workloads for the evaluation. They also did not make any effort to explain the observed performance improvement in the results presented. Thirdly, the paper claimed that machine independent nature of Mach OS helped study the pros and cons of other memory management schemes. I feel that is not the case and the discussion presented is more about the compatibility of various hardware platforms with implementation of Mach OS rather than their evaluation in general. A different design of machine independent VM management might present different results. Lastly, the authors initially claimed that Mach OS aims to support multiprocessor and distributed systems. However, other than small mechanisms that seemingly support such systems, the claims have not at all been supported by any concrete analysis. Overall, there are a lot of questions left unanswered.
5. Confusion
1. Why are message-based systems at a performance disadvantage? Why does Mach not have the same performance degradation? 2. What exactly is the role of each of the 4 VM data structures and how do they interact with each other? 3. What exactly was the issue with chained shadow objects and how did sharing maps solve it?

Posted by: Lokesh Jindal | February 11, 2016 01:45 AM

1. Summary
This paper talks about the design taken by Mach in its virtual memory management to minimize the efforts needed to support different hardware. They made the majority of mechanism machine-independent and the machine dependent part has no knowledge of it.

2. Problem
Different hardware architectures have different memory management schemes, which makes it hard to port an operating system. UNIX achieves this by basing code for the new architecture on existing ones but it restricts the functionalities that virtual memory management can provide.

3. Contributions
Data structures in virtual memory management are mostly machine independent. Mach makes minimal assumption that the machine provide basic support for page fault handling. All machine dependent structures like linear page table in VAX, inverted page table in IBM RT PC and segments in SUN 3 is handled by machine dependent code.
Machine dependent structures are not required to maintain the whole mapping from virtual addresses to physical addresses. The possibility that modifications can be applied lazily may be utilized to improve system performance. This idea is quite different from other systems.
The abstraction of memory objects unifies the operation of swapping a page to disk and mapping a file on disk to memory pages. Both of them can be done by using a file system as the pager, which eliminates the need for dedicated swapfs.
They compared the virtual memory management schemes taken by different hardware architectures. But that part is actually talking about difficulties they encountered during the porting process.
Mach integrates memory management with message-based communication. This may benefit the porting to a distributed environment. But this approach is not well examined yet.

4. Evaluation
The authors failed to show the portability improvement in a systematic way. They only claimed that the porting from VAX to other hardwares was done by programmers without expertise in OS. There is no comparison to UNIX or no testing on the quality of the ported system.
The performance evaluation is also poor. Only simple operations and compiling are used to compare the overall time spent between Mach and UNIX. They need to include various workloads ranging from CPU-intensive to memory-intensive. Individual mechanisms like message passing, garbage collecting, object tree managing, pmap lazy updating and TLB consistency maintaining also need to be evaluated in detail.

5. Confusion
I am confused about why the sharing map is needed when there is read/write sharing.

Posted by: Xiangjin Wu | February 11, 2016 01:39 AM

1. Summary
This paper delineates the authors’ observations in porting the microkernel based OS Mach across different hardware architectures by discussing design and implementation details of its virtual memory management that too with trivial machine dependency and less performance overhead. Four basic memory management data structures in Mach are resident page table to keep track of machine independent pages, memory object as a backing storage, address map to describe mapping of memory objects and pmap to store hardware dependent mapping.
2. Problem
With increasing variations in ISA, OS portability requires abundance of data structures. Also by restricting the features provided by modern hardware and basing implementations on previous versions in order to port easily, this fails to exploit hardware to its full potential. This could also result in downgrade of performance. If this needs to handle then current OS’s assumption of hardware architecture requires significant changes in the kernel.
3. Contributions
Mach attempted to solve this problem by making relatively lesser assumptions about the underlying hardware, thus building an isolated VMM. In addition to simple paging support, Mach provides large address spaces, copy-on-write and read-write memory sharing, memory mapped files and user provided backing storage. This notion of non-singular kernel and integration of VMM and message-oriented communication facility lead to interaction between objects in loosely couple systems in distributed environment more convenient. As opposed to Opal, Mach not being based on purely shared based memory system makes it possible to add alternatives to UNIX features like fork. With the advent of byte offsets, it became possible to vary Mach physical page setting as boot time parameter which obviates the need to have one-to-one mapping with hardware. With hint-based address maps implementation of doubly linked list, large holes in address spaces were effectively handled. VMM design employed features of object oriented paradigm like garbage collection, encapsulation and reference counting thus making it easier to implement kernel functionality in high level programming language. Provision of external pager to kernel interface powered user applications to manage memory according to their needs. Distinct needs of processor like page table format, TLB and MMU maintenance could be confined to a single module. This paper could possibly provide motivation for running multiple VMs on single system.
4. Evaluation
The authors in the paper didn’t pay much attention to experimentation of their prototype though at the end they state that there was little or no performance effect to Mach’s memory management. Mach portability across systems seemed to be easier if one is familiar with Mach kernel codebase though it could have been a challenge to keep kernel address mappings complete and accurate in pmap. No change is required in other part of kernel code like device drivers since micro kernel based design obviates the need to explicitly manages the devices.
5. Confusion
What is the concept of inode pager used by Mach kernel for providing paging services?

Posted by: Unmesh Phalak | February 11, 2016 01:33 AM

1. Summary
This paper discusses the design of the virtual memory management system in the Mach OS with the primary goal of being portable across hardware memory architectures. It relies on a small machine dependent unit and optimized machine independent data structures for memory management.

2. Problem
A major issue around the time of this paper was that OS memory management systems were closely tied to the virtual memory architecture of the hardware. While a hardware abstraction was available for CPU instruction sets, there was no sufficient one for virtual memory. Mach’s principle goal was to be easily portable to a new machine memory architectures without sacrificing performance.

3. Contributions
The following include some of the key ideas introduced in the paper :-
1. Mach introduces inheritance attributes for a task’s virtual address space. Regions of can be marked with shared inheritance, thus allowing read/write sharing between a process and a child. (idea adapted as mmap() in linux). By default the address space is marked with copy inheritance that exploits copy on write optimizations.
2. Message passing (with ports) is the main IPC mechanism in Mach. A large message transferred between tasks is merely mapped as copy-on-write in both address spaces making the message passing efficient.
3. The memory management implementation mainly includes hardware neutral data structures that keep track of physical pages as well as virtual page mappings for tasks (address maps). This interfaces to a small machine specific pmap unit. Thus the pmap component does not need to be aware of all valid mappings, as it can be constructed on the fly using machine independent data structures. (lazy evaluation)
4. Address maps are doubly linked list that map range of virtual addresses of a task to different offsets inside memory objects. This structure allows for simple implementations of the copy on write mechanism. On a copy-on-write fault the address map can point to a shadow object which cover modified pages while pointing to the original memory object for rest of the pages.
5. The backing store for virtual memory pages is represented by the abstraction of a memory object. Mach introduces the idea that memory objects can point to a user space paging server, which communicates with the OS to page in/out data. This is one of the features which makes Mach stand out as a micro-kernel.

4. Evaluation
The authors do evaluate the implementation effort required for porting Mach to a new machine architecture, in this case the IBM RT PC. They show the interface for the machine specific portion (pmap) is simple and does not require in depth knowledge of the entire memory management system. However they do not fully substantiate their claim that it does not hurt performance or aides Mach’s message passing mechanism. Specifically, they run smaller benchmarks for zero fill and fork but not complex cases like shared memory, multiple page faults. Moreover, they discuss compilation speed of 13 programs but not the actual performance while running these common programs.

5. Confusion
I didn't understand how shadowed objects are used for copy on write faults?
How are page faults handled in this system?

Posted by: Brian Coutinho | February 11, 2016 01:22 AM

1. summary
This paper discusses the Virtual Memory Management (VMM) system implementation in the Mach OS. The VMM is split into multiple machine independent modules and a single dependent data structure, without sizeable performance drops. This which increases OS portability as a single code module needs to be changed, and minimal hardware assumptions are made. Further, UNIX compatibility is maintained, while extending and expanding IPC and VMM facilities provided.

2. Problem
OS Portability. In short, VMM facilities were limited, and portability was low. Hardware memory architecture and memory management software were tightly coupled. Software developers found it hard to deal with the wide variety of memory structures in hardware. UNIX systems traditionally handled this by limiting facilities provided.
On non-VAX hardware, UNIX internally simulated VAX architecture.

3. Contributions
Primary Contribution :
A partitioning scheme of Virtual Memory management into hardware independent data structures and a single hardware dependant data structure without sacrificing performance. Efficiency is achieved by utilizing the observation that VMM can be integrated with a message-based communication system.
VMM Facilities :
• Copy on write (shadow copying). Methods prevent long chains of shadow objects.
• Read / Write Memory sharing between tasks. Indirection to reduce address map changes.
• Memory mapped files
• User-provided backing store objects and pagers (handles page faults and page-out requests)
VMM Data Structures:
• Resident page table - machine independent - tracks page information.
• Memory Object - machine independent - a repository for data, indexed by byte.
• Address Map - machine independent - Doubly Linked List of mappings of range of virtual addresses to region in memory object. Also has protection and inheritance information.
• Pmap - machine dependent - single module.
Mach OS Abstractions :
• Tasks, Threads.
• Memory Object - collection of data that can be mapped into a task's address space. Managed by a server.
• Message - typed collection of data objects used in inter-thread communication.
• Port - kernel protected queue for messages.
Tangential Contribution :
A relatively unbiased evaluation of various hardware memory-management schemes, in uniprocessor and multiprocessor systems.

4. Evaluation
The authors describe the ease of porting the Mach OS to a number of different hardware systems, but do not explicitly compare the time taken and how it would compare to a conventional OS port. In terms of benchmarks, a list of VM operations are compared on UNIX 4.3bsd, as are a bunch of compilation tasks. Results were comparable or Mach was better. However, I am not sure if these provide an accurate estimate of real-life system usage. The logical separation of memory management data structures based on machine-dependency is definitely a step forward in maintainability and portability.

5. Confusion
More details on why the memory object structure is unsuitable for read/write sharing, and is this alleviated by address map indirection ?
A walkthrough/example of the VMM system would be great.
Terminology used in the paper is confusing at times, but perhaps this is just due to the age of the paper.

Posted by: Adithya Bhat | February 11, 2016 01:15 AM

1. Summary
This paper talks about the design of CMU's Mach OS focusing on its virtual memory management. One of its main feature is separation of hardware-dependent and hardware independent portions of the system which helps in improving the portability of the system. The paper then talks about some details about its abstraction, design, pros and cons.

2. Problem
The main problem mentioned in the paper is the problem of porting OSes to different hardware/ memory structures. It usually involves a lot of code changes, man-hours and need of professional experienced programmers (expertise)

3. Contributions
The main contribution (and the feature that I liked the most) is the separation of the hardware-dependent and hardware independent portions of the system. There are 4 memory management data structures (resident page table, address map, memory object and pmap) and the only one that is hardware dependent is a memory mapping data structure called pmap (pmap.c). This is really important as it makes the portability of the system significantly easier as now only this file needs to be modified.
It also uses a managing task called "pager" to handle page faults instead of kernels i.e. basic management of memory objects. They also propose (for purpose of data sharing) using shadow objects to implement the copy-on-write mechanism, sharing map to point to shared memory objects. I think this should reduce the memory load in the system and can play an important factor in improving the scalability of the system

4. Evaluation
They claim that porting Mach VM is a relatively simpler and they prove it by stating that it took an inexperienced system programmer only 3 weeks to port it on VAX 11/784. They also assess various memory management architectures and talks about some of the cons/issues related to uni-processors (memory constraint i.e. need of large page tables) and multiprocessors (mainly TLB consistency). Finally they evaluate the performance of Mach and Unix for various system calls/tasks (as well as compiler performance) and prove that the overhead of using such a system is not significant, also in many cases Mach outperforms Unix. One of the key feature mentioned for the advantage of using Mach was integration of virtual memory management with message-oriented communication. There isn't any evaluation in the paper which justifies this statement. Along with this would have liked to see more evaluation where the overhead/performance (run-time , memory) of some specific features like CoW, Shared maps etc could have been presented

5. Confusion
Who is responsible for the protection of all the activities related to message passing, did they use something similar to capabilities? Is Mach a physical page or a virtual page? What differences are there between mach in an internal kernel and mach in external user-state task?

Posted by: Anubhavnidhi "Archie" Abhashkumar | February 11, 2016 01:10 AM

Summary

This paper talks about design and implementation of a virtual memory management system for CMU’s Mach OS that is decoupled with underlying hardware memory management schemes. And hence, it provides portability and rich set of features across various hardware architectures without sacrificing much of the performance.

Problem

This paper came out at a time when the importance of portable operating systems like UNIX was now being recognized over vendor specific operating systems. However, there were still no well-defined means to port memory management mechanisms and provide a consistent rich set of features across varying hardware architectures. It was against this backdrop that the authors proposed to create a virtual memory management system that would be portable across varying hardware architectures and still offer a rich set of features at a comparable performance.

Contributions

The main idea developed in the paper is clear separation of machine dependent and independent portions of virtual memory subsystems. The independent subsystem only expects the machine dependent portion to handle and recover from page faults. Hence, the entire machine dependent code can be a single code module that only needs to be ported to different architectures, while the rest of the things can remain same.

Four basic data management structures are introduced: resident page tables, address map, memory objects and pmaps. Resident page tables are used for machine independent mappings of physical pages. Each task’s virtual address space is mapped to memory objects using an address map. Mach uses byte offsets in memory objects to map contiguous ranges of virtual addresses. Implementation of address map as a sorted doubly linked list is another nice idea that ensures performance even for large sparse address spaces. To facilitate efficient copy-on-write mechanism, concept of shadow objects is introduced that can be used to hold modified parts of their shadowed objects. However, these shadow objects suffer from spiraling effect when multiple shadow objects can be created for themselves and authors admit that it can lead to complex garbage collection and locking rules. Finally, pmaps are the only machine dependent portion that store virtual to actual page mappings. Much of the efforts required to port Mach to different architecture only requires changes to pmap module.

Another contribution of the paper is the comparison between different hardware memory schemes in the contemporary systems like VAX, IBM RT PC, Sun 3, Encore Multimax and Sequent Balance. The authors have brought out the advantages and disadvantages of memory management schemes for each of these systems and also talked about methods to address cache consistency issues in multiprocessor systems.

Evaluation

Throughout the paper, the authors have highlighted a rich set of sophisticated features supported by the Mach virtual memory management that was lacking in contemporary UNIX systems. However, the performance evaluations are surprisingly restricted to running a few standard benchmarks. They do not present a clear picture of cost of providing those rich set of features, even after admitting the complexity of their proposed solutions (Section 3.5). Also, I am not very sure of the intended conclusion drawn between performance and code compilation time, as presented in Table 7.2. Are the 13 C programs mentioned in the evaluation representative of a usual workload? It is not obvious. Besides, the authors have talked about porting Mach’s virtual memory management to different kinds of uniprocessor and multiprocessor systems. However, the performance comparison between various uniprocessor and multiprocessor implementations are not explored in the paper.

Confusion

Some confusion around use of an external pager to handle page requests / page faults. And how sharing maps exactly solve the problem of data sharing when the shared data is also being shadowed for copy-on-write?

Posted by: Saket Saurabh | February 11, 2016 01:10 AM

1. summary
This paper describes the design and implementation of the virtual memory management within the Mach operating system that is portable to distinct uniprocessor and multiprocessor computing engines. This virtual memory management system makes minimal assumptions about the underlying hardware support and thus separates the software memory management from the hardware support without sacrificing on performance.

2. Problem
Os portability suffers from a proliferation of distinct memory structures, resulting in limited virtual memory management other than simple paging support. When the software relies on the underlying memory architecture, it poses several challenges to UNIX style portable environments and more sophisticated use of virtual memory mechanisms, such as transaction processing and database management.

3. Contributions
The major contributions of this paper are:
* Portability: It seems to be the first to implement machine-independent and portable virtual memory design. It does so without sacrificing on performance.
* All hardware-dependencies are defined in a single abstraction.
* It introduces advanced memory management implemented in the software layer such as virtual memory sharing between tasks using message passing.
* It supports distributed paging and sharing across networks.

4. Evaluation
The comparison of the performance of Mach with the traditional UNIX systems on different hardware shows that Mach performs better than UNIX or same on different VM operations. Further, the comparison between the Mach and the UNIX on the overall compilation performance shows better performance for Mach. Since there are several added OOD complexities and context-switches, these better-than-UNIX performance figures highlight the advantages of the Mach approaches to machine independent memory management system.
However, since various complexity issues arise with shadow chains and user-level page handlers, it would have have more convincing with more load-bearing tests of Mach paging thresholds.

5. Confusion
The terms “object”, "memory" and “memory object” are overloaded in this paper and have led to some confusions.
With external pagers, wouldn’t there be lots of context switching?
It is also not very clear how support for distributed paging and sharing across networks is implemented in details.

Posted by: Udip | February 11, 2016 01:00 AM

Summary
The paper presents the design and implementation of virtual memory management in Mach Operating system. The highlight of this design is the separation of machine-dependent memory management from the machine-independent one (virtual memory management for user processes.). Since, the design relies on some assumptions at the hardware side, the paper also discusses shortcomings on hardware architectures that might create issues in designing such machine-independent operating systems.

Problem
Traditional operating systems being designed specific to hardware architectures, a considerable amount of effort is required in modifying an operating system, each time when it is being ported for a different architecture. The paper aims at reducing this overhead by separating the machine-independent memory management from the machine-dependent.

Contributions
- The paper provides a newer way of memory management where the machine-independent memory management (like the virtual memory management) is moved away from the kernel..
- Such software architecture helped in reducing the kernel size which in turn helped in reducing efforts for porting an OS to some other hardware architectures; hence increasing the portability for an OS.
- The authors also provide the implementation details for such a system with a description on some of it's machine-independent (like pmap) and machine independent (like address map, memory objects) data structures.
- The newer design does not give up on the traditional virtual memory management and inter process communication and continues to provide support and is compatible with user processes relying on things like large virtual address spaces, copy-on-write and read-write memory sharing, and memory-mapped files. The paper highlights on how each of these are supported using the newer design.

Evaluation
The paper claim to perform similar or, in some cases, better than the traditional Unix operating systems. The claims are based on some individual testing of memory operations like fork, read operations and some hardware architectures. Even if Mach OS seem to be performing better than Unix, such kind of testing seem really inadequate when it comes to testing an operating system. It would have made much more sense had it been tested under real life scenarios like many user processes running memory intensive operations. Plus, with the claim that the portability has become easier is not well proved since they never mention on efforts required to port a traditional Unix OS to a different hardware. Plus, the paper misses out on giving details on what exact machine dependent structures were separated by this new design that helped in portability vs what had been traditionally used.

Confusion
It would be great if we discuss on what exact machine-depedent memory management has been separated and how this has improved the portability of this newer OS. This would also help in answering on why it was difficult for traditional operating systems to port to different hardwares.

Posted by: Akshay Kanfade | February 11, 2016 12:41 AM

summary~
This paper presents the the work of extending the virtual memory management in the Mach operating system that aiming for better portability with little or without performance penalty. They did it through the separation of software memory management from hardware support. The authors then evaluate the performance of their implementations against other traditional UNIX OS, and the result shows that they approach the separation with little or no effect on the performance.

problem~
The emerging of diversity in memory structure presents challenges on operating system. Due to the lack of separation of software and hardware, operating system’s portability suffers from the increasing diversity in memory structure.

contribution~
The idea of decoupling the implementation of virtual memory system into machine independent and machine dependent section solves the problem of portability. The idea of shadow object as a part of sharing mechanism is also interesting. It is different from the other sharing mechanisms that we saw in the previous papers where the traditional CoW technique copies the entire shared portion and then write new data onto this newly created copy. In the case of shadow object, only the part of newly created data is written to the memory and then this newly created data will link back to the original piece of data. This further reducing the memory footprint by enlinaming unnecessary duplication of data and improve performance by reducing the memory traffic. Besides the improvements in portability and performance, another important contribution of this work is that it provides a way for comparing the pros and cons of various hardware memory management schemes.

evaluation~
The authors set up the environment to evaluate the performance of Mach against 4.3 bsd. They measure the performance of Mach VM Operation across a wide range of scenario including file reading, memory zero fill, fork and compilation. The results shows that Mach is no worse than 4.3 bsd in many cases, and for some operations Mach even outperform 4.3 bsd.
But one problem is that the authors didn’t give any interpretation of their result and some of the measurements are interesting. For example in the case of fork operation, Mach performs better might because of the shadow objects mechanism. In the case of file reading Mach performs better might because of the caching policy.
Another issue is that authors didn’t give any justification for why they chose those operations to measure.

confusion~
The paper mentions about to not let all the virtual to physical mappings resides in memory will increase space and time efficiency. How the time efficiency gets better by doing so? won’t this approach slow down the memory access by a lot?

Posted by: Yudong Sun | February 11, 2016 12:29 AM

1. summary
The paper deals with decoupling the machine architecture specific modules from the virtual memory management modules in order to enable simplified porting of an operating system to different machine architectures.The paper discusses how the memory object,address map and pmap abstractions can be used to simplify interaction between the dependent and independent modules.

2. Problem
Virtual memory data structures that are machine dependent are proliferated throughout operating systems code.This makes it difficult to port the operating system from one architecture to another. Existing UNIX systems deal with this problem by basing newer memory management architectures on previous versions.

3. Contributions
Firstly, Mach manages copy on write requests by creating shadow objects which contains only the pages that need to be modified. Subsequent copy-on-write requests lead to creation of a chain of shadow objects. Copy on write enables sharing between unrelated tasks with minimal data copy. Child tasks share the parent memory objects based on the inheritance attribute settings.

Secondly , the address map is a double linked list data structure which maps virtual addresses within a task to memory objects. These memory objects are a unit of of backing storage maintained by the kernel or user space. The memory objects have reference counts and when there are no references to an object the object can be removed. But the Mach allows associating a managing task called a pager with each memory object which uses domain specific knowledge to determine whether a memory object is needed or if it can be garbage collected.By using the pager ,page faults and page out requests can be managed outside of kernel space. Kernel can communicate to the pager using message passing through ports.

Thirdly, The main machine dependent code is in the physical address maps called the pmaps which is implemented as a page table for VAX architecture or a set of allocated segment registers for the IBM RT PC. The only restriction imposed by Mach is that virtual memory operations should be aligned on page boundaries.

4. Evaluation
The ease of porting Mach to machines of different architecture models has been supported by providing statistics about the minimal time taken(3-5 weeks) to port the Mach to IBM RT PC and SUN 3 machine architecture.It further supports the claim by mentioning that the only hardware requirement is an easy to manipulate hardware TLB.

The paper discusses the virtual memory implementation of Mach when ported to VAX(which supports page tables) ,to IBM RT PC (which supports inverted page tables) and SUN 3(which uses combination of page tables and segments). Additionally they also highlight that there have been several problems when porting to Encore Multimax, Sequent Balance and National 32082 MMU. The paper covers the problems faced when porting to multi-processors, which is that hardware guarantees cache consistency but it does not treat the TLB as another type of cache and keep that coherent.

This discussion clearly showcases the versatile machine architectures that Mach supports as well as highlights problems due to which it cannot support other architectures. Using empirical results it is shown that using Mach approach for machine independent virtual memory doesn't degrade performance instead it improves the performance when bench marked against BSD UNIX.The paper could have included a more detailed evaluation of the efficiency and performance pf copy-on-write sharing and read-write sharing.

5. Confusion
while copy-on-write sharing and shadow objects are well explained , the section on read-write sharing was not clear specifically about how read-write shared memory can be simultaneously shared as copy-on-write memory.

Posted by: shreya kamath | February 10, 2016 11:45 PM

1. Summary
This paper describes the design of the virtual memory subsystem in the Mach OS. This system achieves portability without affecting performance through separation of machine dependant and machine independent parts of memory management.

2. Problem
OS portability is severely affected as diverse hardware provide support for different memory structures. While certain hardware support paging, others support paging and segmentation and some even use inverted page tables. To cope with this diversity, traditional systems like UNIX restricted the feature set to the extent that could be ported across platforms. Mach attempts to solve this problem of portability without sacrificing system performance.

3. Contributions
The primary contribution of this paper was the clear demarcation of the machine dependent and machine independent components of memory management. I feel that this is a significant breakthrough as OS portability would now predominantly involve changes to the machine dependent part of the system. The integration of virtual memory management with message passing improves efficiency and allows large amounts of data(entire address space, file) to be transmitted within a message. The page size is a boot time parameter and is not limited to the hardware page size. The system also offers page level protection. Sharing via copy-on-write is easily implemented via messages without actual data copy operations. Read/write sharing is also efficiently implemented by allocating a shared memory region and setting its inheritance attribute.

4. Evaluation
The authors claim that the MACH memory subsystem is easily portable and give an estimate of the amount of time taken to port this system to different architectures. The IBM RT port took about 3 weeks for an inexperienced programmer and the Sequent port took about 5 weeks for an expert programmer. The pros and cons of various systems are also described; VAX causes a large page table due to its small page size, IBM RT PC allows only one valid mapping of a physical page creating problems for sharing while the SUN3 results in sparse resident page tables.
The authors evaluate performance by comparing simple tasks like reading a file, forking, etc... and are not comprehensive enough to support their claims.
Overall, the authors provide good design abstractions but the claims are not supported with adequate data (supporting benchmarks, etc..)

5. Confusion
What exactly are Mach physical pages ??
More details on memory objects would be really helpful.
The paper introduces a number of abstractions but none of them are supported with a high level design diagram.
Also, there are no graphs to compare the performance aspects of this system.

Posted by: Vinothkumar Siddharth | February 10, 2016 11:44 PM

Summary

This paper discusses the virtual memory management of Mach Operating System. The VMM is broken into - machine dependent and machine independent modules thereby facilitating a variety of virtual memory functionality without being tied to any particular hardware architecture.

Problem

Operating System developers find it hard to port the OS on multiple hardware due to the explosion in the memory structures. Moreover, different hardware come with different handling of memory and hence ended up making porting difficult. Unix solves this problem by having a very simple memory management system and restrict the functionality provided by the VMM so that porting becomes easier. Thus, there arises a need to design such a system that can be easily ported and the software memory management system is independent of the hardware support.

Contribution

The biggest contribution of this paper is to break down the VMM into two parts - machine independent module and machine dependent module. Machine independent module can provided numerous functionalities that Unix like VMM doesn't provide and it doesn't rely on any specific hardware support. On the other hand, machine dependent module is very simple and hence facilitates easy porting as the machine dependent module only needs to be modified.
Second contribution of the paper is that one can use message driven communication for virtual memory management. Single large file or whole address space can be sent via a single message with the efficiency of simple memory remapping. Message driven communication also allows the user application to have control(page-in, page-out, pinning) on the memory objects it creates, thus enabling its participation in the overall functioning of the system. Message passing also helps in scaling the system on multiprocessors and distributed systems as the data can be accessed/mapped from anywhere. The VMM allows data sharing by efficiently using CoW and shadow objects and allows sharing to be done at address level helping in effective usage of the memory.
This overall concept can be easily used to support extent based filesystem such that each extent can be viewed as a memory object and intelligent user level applications can control the in memory presence of such objects. I really liked this paper a lot.

Evaluation

This paper claims to have provided an easy mechanism of porting Mach on different hardware without sacrificing the performance. The author extensively talks about how someone without prior knowledge of OS/C could modify the machine dependent code to port Mach on IBM RT PC with ease. They even talk about how Mach is ported on different hardware having different support for memory management so that all the features of the VMM can be utilized.
However, the evaluation of the performance aspect is disappointing. The performance evaluation is done on very simple operations like fork, read and no real world workload or memory intensive workload is chosen to evaluate how the VMM performs. No effort is made to show how much impact shadowing have on the memory and performance.

Confusion
How is heap memory managed by Mach? Who owns the memory objects referred in the heap? Under memory pressure, how does the kernel decide which page to page-out? Is that policy fair to all the tasks?

Posted by: Yuvraj | February 10, 2016 11:41 PM

1. Summary
This paper strives at designing a memory management system that is independent of the varying hardware memory schemes. In doing so, it achieves higher portability to both single and multiprocessor systems. Mach VM supports large and sparse virtual address space, CoW and r/w sharing between tasks, and user-specified page fault handling using pagers. They evaluate only on different uniprocessors and find high portability with comparable performance.
2. Problem
Memory abstraction is heavily aligned with the hardware schemes and support. As a result frequent OS modifications for support makes portability harder to achieve. It is therefore necessary for the software to make few assumptions about hardware architecture, so that a single memory management system can be applied to all uni- and multi- processor systems.
3. Contributions
Mach VM draws into a microkernel architecture. It implements a distributed network of tasks communicating for memory objects through a channel called port, which is protected by the kernel. All memory features and mechanisms would be at user-level: key idea 1 . The tasks get virtual pages through the resident page table, which translates to memory objects through an address map, which maps into physical pages through a pmap: the only machine dependent module. An elegant way to handle page faults was to pass it as a message to a specific task per memory object called pager to the kernel. Thus, moving page fault technique into the user space: key idea 2 . Novel sharing constructs have been presented- a shadow page scheme to support CoW objects with lazy evaluation of modified pages and sharing maps to support r/w sharing: key idea 3
Overall, these techniques very much align with Opal’s line of thought with explicit message passing between domains/objects, with reference counts/objects, and protection inheritance. It paves way for an innovative thought of re-designing system software to be flexible and portable: moving away from being a monolithic to a micro-kernel.
4. Evaluation
The major problem that this paper focuses on is portability. Thus, they evaluate how easy is portability: they successfully adapted to various architectures(VAX, Sun, IBM RT, Encore etc.) with little change to pmap module, with only few taking weeks(acceptable). The paper then shows that this machine independent VM design doesn't affect performance (ideally they should have expected greater performance improvements). The performance of the Mach system was compared with Unix on a VAX machine. Measurements of basic VM operations as well as VM-intensive tasks showed that Mach performed better than Unix on most tasks. The workload isn’t quite wide- taking individual CPU/memory/IO intensive workloads would have justified as to which abstraction/feature offers targeted improvement. They did not extrapolate the exact reason leading to the performance gains: it could be because of the pros of their features while also overcoming the communication overheads, port congestions, validations etc. Also, they do not extend the tests on a multiprocessor system.
5. Comments/Confusion
Confused with the Mach memory system for a multiprocessor system and also its security model on the message passing for resources. Also, a brief timeline of innovation flow in this area(memory-management) would be helpful to picturize the academic vs industry growth/adoption.
Useful reference- Apple’s memory discussion on OS X which descends from Mach VM. here

Posted by: Tithy Sahu | February 10, 2016 11:28 PM

1. Summary
In this paper, the authors describe the split of the Mach OS into two sections: machine-independent and machine-dependent sections. Furthermore, they use the virtual memory management portion of the OS to show specifically how this separation would work.
2. Problem
The problem is that current OSes are written in a machine dependent way. As a result, it is very hard to port the OS to a new hardware without significant modifications. Current OSes tend to simulate old architectures on new machines, but that is slow, not innovating, and does not fully take advantage of the new hardware.
3. Contributions
In order to solve the above problem, the authors propose dividing up the OS into 2 sections: a section that is machine-independent and a section that is machine-dependent. The independent portion does not need to know anything about the hardware, and it will use generic data structures. On the other hand, the machine-dependent side uses machine-specific data structures and communicates with the underlying hardware the way that hardware expects. As a result, one could take full advantage of the features of hardware with only rewriting the machine-dependent portion rather than the entire operating system. The machine-dependent side is really just a module and a generic header, which the machine-independent side can use to communicate with the module.
The authors used the virtual memory management system of the Mach kernel to illustrate how the split will affect the OS design. Their virtual memory consists of data structures such as page tables, address maps, memory objects, and pmap. They describe the idea of memory objects, which is defined as a unit of backing storage, similar to a file in UNIX. Since the Mach OS is a microkernel, the system could have multiple user processes that manage many of the resources. As a result, each memory object is associated with a managing process (or a pager). Through this abstraction, file systems can be implemented as a pager, and memory objects (acting as files) can have the file system as their manager.
The authors also explain the way they handle COW. Whenever a memory object is modified, a shadow object is created and the modified page is copied, however, the rest of the pages in the memory object is not copied (the original pages are used). As a result, the shadow object will be gradually filled with new data, which alleviates the cost of copying the entire memory object at first write operation.
4. Evaluation
In order to evaluate the system, they only perform zero filling pages, forking, and reading files of different sizes, and they compare the results to 4.3bsd UNIX kernel. I expected features to be fully benchmarked; features such as COW, in order to see how much better their shadow objects perform. I definitely expected an explanation of why they see a performance improvement over the 4.3bsd. For example, why the 4.3bsd mechanisms fail to perform as fast.
5. Confusion
Was memory objects abstraction really necessary?

Posted by: Arman Shanjani | February 10, 2016 11:26 PM

1. Summary
The paper discusses the development of Mach OS to handle different underlying hardware by keeping the code base modular for machine dependent and independent portion. It also discusses the communication protocols between executing units using efficient techniques such as copy-on-write (CoW) and memory remapping.

2. Problem
Lack of an OS which can handle multitude of underlying hardware prevalent in that time frame. Existing hardware vendors had to write their own OS which was specific to the underlying hardware and hence offered poor feasibility.

3. Contributions
The authors came up with a design in which they separated the machine dependent (which was constituted in pmap.c file) and independent code. They discuss various options such as CoW, Read/Write to efficiently share memory and at the same time enumerate comprehensive list of APIs (in Table 2.1) to handle memory allocation. Since the memory objects can be shared, they further provide optimizations using address maps and sharing maps which points to a shared memory object. CoW based sharing may require making copies on the fly, but a further optimization by lazily allocating only those pages which were modified and keeping a track using shadow object pointers ensure that overhead of CoW are low.
They introduced object oriented concept with memory by providing access to it using ports and pager (which acts as interface with the kernel on these ports and the threads requiring access to the memory objects). However, they did not talk about port discovery mechanisms and the overhead of message passing. To handle the machine dependency, they kept a data structure - pmap, which provides a mapping for various memory objects.
The design proved to be an inspiration for many modern OS such as OS X which proves that their work indeed provided a platform for further research and development, unlike previous solutions which remained more of a proof of concept.

4. Evaluation
The authors compared the time taken by Mach OS against UNIX and 4.3 BSD across different major hardwares available and quantitative proved that this actually performed better. However, they did not provide any other experimental details except for result. The authors could have also evaluated overhead of the hardware dependent and independent code and discuss the performance on loosely coupled multiprocessors.

5. Confusion
I am still unclear about the overall picture of the design and the interactions between the various data structures. I don’t quite clearly understand what do they mean by a Mach page, whether is referring to a virtual page or physical page or another level of intermediate translation.

Posted by: Vikas Goel | February 10, 2016 10:36 PM

1. Summary
Mach has a virtual memory management unit which can be divided into machine dependent part and independent part: the machine dependent data structures contain only those mappings necessary to running the current program. It also use shadow object to hold changed data from shared memory.

2. problem
Memory management unit in UNIX has lack of portability because it restricts the facilities provided and its conservative policy for new memory management architectures evaluated already.

3. Contribution
I think main contribution in this paper is shadow object which can reduce the burden of CoW policy because it does not need additional pages to copy all the data. In the case of Disco, it requires new pages to be allocated to copy the data from shared area when a thread tries to change the shared area. On the other hand, Mach does not need to allocate same amount of pages as shared area to use memory area efficiently. Mach uses shadow object to store only partial data which is changed from shared area. It may hurt the performance because it causes trace of memory data. In my opinion, it may make the performance even worse than previous system because shared chains need a lot of time to trace the data if chain length is too long.
Seconds one is the separation between machine dependent and independent parts. The machine dependent part required to be modified whenever there are changes in hardware or new hardware architectures while the Mach only needs change in machine dependent part without the change of machine independent part.

4. Evaluation
The paper evaluated the performance of Mach with specific system call. I assumed the reason why they choose several function is Mach is the specialized OS for memory management. Even so, they did not evaluate the effect of garbage collection and shadow object (shadow chain), which are main changes in this paper. Allocation and deallocation are a part of this paper but there are more features which are addressed, and which are not evaluated. Another thing they did not show is that if a thread changes the shared data totally not partially, the paper should provide how the performance is changed.

5. Confusion
What is the difference between address map and paging? And address map entry and page entry?
If the shadow chains are very long, it may be efficient to store data into pages after allocation the pages. Is it possible in this paper?

Posted by: Choungki Song | February 10, 2016 08:34 PM

1. Summary
This paper presents the Mach virtual memory system. This system uses machine independent internal data structures which depend on a small machine dependent module. Such a design makes it easily portable across different hardware architectures without sacrificing system performance.

2. Problem
Hardware varies considerably in the kind of support they offer for memory management. Certain contemporary hardware supported paging, while others only supported global inverted page tables or a combination of segmentation and paging. System architects were unable to deal with such diversity and it resulted in proliferation of memory management systems to handle different hardwares. Popular systems like UNIX took the approach of restricting the memory management systems to an extent that they could be easily ported across the supported architectures. This resulted in trade offs like limiting the process virtual memory size to 64MB or lesser. While other systems like Accent and Multics which provided advanced features like mapping files within segments of virtual memory or efficient transfer of large virtual memory regions between protected regions were heavily tied to a hardware architecture.

3. Contributions
The major contribution of this work is the design and development of a virtual memory system which offers advanced features and can be ported easily across different hardwares without compromising on the features supported. The paper highlights this by citing the experience in porting Mach memory management system to four different hardware architectures in relatively lesser time from the time of its inception. Another contribution of this system is integrating memory management and communication via ports which can be used to communicate actionable messages across tasks, threads and memory objects. One more important feature of this system is the ability to handle page faults and page-out requests outside the kernel by using special tasks called pagers which can manage these requests on behalf of the kernel.

4. Evaluation
The paper claims that the Mach memory management system is easily portable, offers the same set of services across different hardware it is ported on without sacrificing the performance. To prove their first claim the authors cite their experience in porting the system to IBM RT PCs which took approximately 3 weeks for an inexperienced system programmer. The effort, though commendable is not compared with the man hours spent on porting other operating systems. The authors support their second claim by explaining nuances of different hardware on which Mach was ported and the efforts required to offer full functionality of Mach memory management on those systems. But as far as the claim for performance goes, there is not much evaluation. The paper just shows values of micro-benchmarks for some simple operations like forking and reading, but do not give the complete picture of how the system will behave under memory pressure or in real life scenarios.

5. Confusion
The paper introduces too many data structures, may be we can have a pictorial representation of the dependencies they have on each other so that I can wrap my head around all of those.

Posted by: Mihir Shete | February 10, 2016 07:38 PM

1. Summary
Mach is an exploration of an operating system that tries to separate hardware and software requirements as much as possible. Mach intends to make as few assumptions as possible about memory management hardware and use machine-independent instructions for the handling of all virtual memory. The authors discuss Mach and the details of implementation, and then use their system to compare the pros and cons of various hardware management schemes.

2. Problem
One big problem with operating system portability is the proliferation of memory structures. Mach is intended to be as machine-independent as possible, using only a single code module and a header file specific to each procesor. This also has to be done without unduly sacrificing system performance.

3. Contributions
The larger part of the paper seemed to be dedicated to explaining Mach's memory structure. However, the concepts don't seem to be anything really new: for example, for data sharing, Mach primarily uses COW. If one task wants to alter an object, Mach has its own system for handling modifications. This is done via “shadow objects”, which holds just the modified data. The shadow object relies on the main object for unmodified data, and they can be nested. This can be complex for read/write sharing, though, as there can be a complex chain of shadow objects. Instead, a sharing map may be used. The sharing-map points to shared memory objects, and map operations that apply to maps sharing the data are applied to this instead. Garbage collection is applied to shadow objects when intermediate objects in the links are no longer needed. We've already seen something similar to the concept of shadow objects before, though, in Disco (which uses a B-tree to keeep track of modifications instead but the same general principle is the same).
I feel that the main contribution of Mach really is the split into machine independent and machine dependent code: dependent code relates directly to the hardware and involves only the operations that are critical for machine execution, and machine independent code is generally everything else that handles virtual memory information. Physical memory in Mach is handled via pmaps, which are physical address maps that keep track of virtual to physical mappings. One important part is that not all virtual to physical mappings need to be held in memory, as these can be reconstructed at fault time via Mach's data structures. Only the kernel mappings must be kept in memory at all times. All in all, Mach places relatively few demands on the hardware, requiring only: 1) there must be some ability to handle and recover from page faults, 2) address space is limited by addressing restrictions of underlying hardware.

4. Evaluation
In theory, Mach is as machine-independent as possible and doesn't need in-memory hardware-defined data structures. In practice, the pmap still needs to manipulate hardware, which in turn controls an internal MMU TLB. Mach is still portable by OS standards, though, and can show the weaknesses and strengths of various systems:
(1) VAX: VAX requires a large amount of linear page table space. This is handled by storing the page tables in physical memory and only mapping virtual to physical addresses for pages actually in use.
(2) IBM RT PC: instead of per-task page tables, this uses inverted page tables instead. This allows for reduced memory size, but each physical page can only be mapped to a single virtual address. This makes sharing hard, and can cause a lot of extra page faults.
(3) SUN 3: this uses segments and page tables to create per-task address maps. However, only 8 contexts can exist at a given point. There are also “holes” in it because display memory is addressible as “high” physical memory
(4) Encore Multimax/Sequent Balance: chip bugs and memory requirements make this unsuitable for use.
(5) Multiprocessers: while developers usually try to guarantee cache consistency, TLB consistency is generally not supported. Mach must work around this by either (a) forcibly interrupting CPUs and remapping, (b) using regular timer interrupts to change mapping, or (3) allowing temporary inconsistency.
Having said that, I think that the authors could have done more in their evaluations. Their final results table seem to be comparisons of simple tasks: reading a file, filling a page, etc. There should be more variety in workloads for better comparison. One thing that struck me was IBM RT PC. The authors state that “extra faults [caused by sharing physical pages between multiple tasks] are rare enough” that Mach does not suffer in performance, but that may be because the tasks they tested Mach with are so relatively simple. They list one workload of “13 programs”, but there's no real detail of what those thirteen programs are or how they would behave differently on different processors. More complex workloads would confirm whether or not Mach holds up .

5. Confusion
It is stated that a cache may be used to store frequently used memory objects even there are no references to them. How exactly are the memory objects known if there are no references to them? In addition, they propose three different solutions for Mach in multiprocessors. With what frequency is each solution used? Is it an all-or-nothing situation or is there some internal algorithm?

Posted by: En-Ui Annie Lin | February 10, 2016 04:57 PM

CS 736 Reviews - Spring 2016

Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures

Comments

Post a comment