The Duality of Memory and Communication in the Implementation of a Multiprocessor Operating System

Michael Young @ CMU

Proceedings of the Eleventh ACM Symposium on Operating System Principles, November 8-11, 1987

Mach

Object oriented design

Micro kernel

Message passing IPC

Binary compatible to BSD4.3

Memory object and data manager (= pager)--application specific secondary storage manager

Complementary roles of VM and IPC

IPC uses memory mapping to transfer large messages in tightly-coupled or uniprocessor machines

VM uses IPC for communication between kernel and pager--application specific page fault handler. Shared virtual memory in distributed computer system is free!

Five Major Abstractions in Mach

Task & thread

Task is the basic unit of resource allocation

Task has a virtual address space

Task has a protected access to system resources

Task has one or more threads in it

All threads within a task share resources

Unix process = a task with a thread in it

Port

A channel to communicate with other tasks or itself

Access rights to a port

are granted to a task by sending a 'send right' or 'receive right' message to the task

may have multiple senders but only one receiver--the owner of the port

Each port represents an object. Owner task of a port Vs object the port represents:

Port P represents an object O. The owner of P is task T

A task sends a message to port P

The owner task T receives the message

T performs the corresponding action on the object O

Kernel, which is also considered a task, owns the ports representing tasks, threads, etc.

IPC to a task = sending a message to the port which represents the receiving task

System call = sending a message to task self port, which represents itself

Message

A fixed size header & one or more variable size bodies

Ordinary message - not interpreted by the kernel

Special message - interpreted by the kernel

Basically RPC style communication

Send a message along with a send right to the port through which the sender expects a response

Sender blocks until the reply is received

The sending buffer is reused to receive the reply

Asynchronous communication is also possible

Memory Object

is an abstraction for the collection of data bytes, e.g., pages of hard disk

is also represented by a port

is managed by a pager, a user or kernel level task, which is the owner of the port representing the memory object

Secondary storage can be accessed by sending messages to pager!

Virtual Memory Management

Task's address space is composed of regions (= regions in Unix)

Task can allocate regions anywhere within the virtual address space

vm_allocate

Task can also allocate regions mapped to memory objects - like mmap in Unix

vm_allocate_with_pager(..., memory object, ...)

Kernel sends init message to the pager to initialize the object

Memory reference

A task references a virtual address: vm_read or vm_write

Kernel translates the virtual address into a physical address. If page fault occurs,

Kernel sends a message to (the port of) the corresponding memory object

pager_data_request(memory object, port to which the kernel expects response, ...)

Pager receives the message and reads the corresponding page from the secondary storage, possibly using the kernel

Pager responses with the data read to the kernel

Pre-fetching can be done by the discretion of the pager

Dirty page cleaning

Kernel sends a message to the pager

pager_data_write(memory object, ..., data in physical memory, ...) - kernel does not wait response

Hint to the kernel

Pager can request kernel to flush or clean cached data

Default memory objects and pager

Kernel provides default memory objects and pager

Using Memory Objects: Example pagers and their applications

A Minimal Filesystem

Application interfaces and filesystem's actions

fs_read_file("sschang", &data, size)

allocate a new object

perform file lookup with "sschang"

vm_allocate_with_pager(..., the new object, data, ...)

data[i] = do_something()        // page fault occurs, kernel sends page_data_request message to FS

allocate a buffer

read the faulting page(s) into the buffer

sends 'page_data_provided' message to the kernel

deallocate the buffer

fs_write_file("sschang", data, size +- anyvalue)

Consistent Network Shared Memory Excerpt

Shows how to implement simple, centralized "shard virtual memory" system. See Fig 4-1. In this figure, shared-memory-server on the center of each diagram is pager

Can you implement shard virtual memory system using the Mach's messaging scheme?

The Problems of External Memory Management

Types of Memory Failures

Threads may become blocked for data supplied by another user task, which does not respond promptly

When the kernel wants to kick out a page, what if the data manager (=pager) does not response promptly (on purpose)?

What if a data manager responds to 'page_data_request' with too many pages

What if a data manager maliciously changes its contents so often which have been read-copied to multiple tasks

Deadlock

A data manager needs to read data from another manager which in turn needs to read data from the former

Handling Memory Failures

Timeout mechanism can be used

After timeout, let the default pager, which is the trusted one, intercept the processing and finish it

Multiprocessor Issues

As network becomes fast enough, the distinctions between message-passing and shared-memory approach for multiple threads support are becoming blurred

How adequate and selfish this argument is for Mach guy!

Applications

Emulating Operating System Environment

Copy-On-Reference Task Migration

Database Management: Camelot

AI Knowledge Bases: Agora