**PILOT: An operating system for personal computer** ==================================================== *some take way point* ---------------------- In multi-user machine, fairness and protection need to be guaranteed. There are benefit of such system, like: + the users are isolated from mysterious underlying hardware + uniform services and facilities + the system can protect itself from users without making any assumption about what they do + more robust facility (because underlying structure is concealed) Hence, there is a sharp boundary between the user application and the OS. However, this boundary limits the openness and flexibility of the system. For a personal computer, how can we design a better OS? --> We can do that by trade fairness and protection with openness, and performance hence there is no sharp boundary between user app and the OS. --> The system can serve the user apps better (with hints, and whatever) There are lots of examples in Pilot that show this openness and the uses of hints as a way to improve performance (the space interface, the maps, the databases). However, there is one downside. Because of this openness, user apps can do erroneous things (directly access and modify some memory words), hence the system needs to provides some kinds of robustness, i.e, recovery from crashes or resistance to misuse. + example: in Pilot, maps are just hints for fast look up, label is absolute + scavenger is provided to ensure robustness :) *Problem* ---------- How to design an OS for a personal computer? *Solution* ---------- Emphasis not on sharing (= fair allocation of resources) or security, but on serving the PC user. Then how? - Resource Allocation - efficiency rather than fairness - Security - errors are more serious than maliciousness - User convenience like PnP & automount is important ==> volume - More aggressive hint from the user can be used ==> space - More freedom for user to use system resources ==> direct device access - We can't expect hardware level reliability for PC. How about more fault-tolerant file system - each block is absolute and inode becomes hint (inode here means the mapping info, the actually info is stored in the *label* associated with each block) ==> RAID is uncommon, hence, reliability at the OS level (thus the philosophy: live with failure rather blame that failure --> software needs to be reliable) *1. Introduction* -------------- - Personal computer operating system - Single user system, and support a single language: Mesa - Close coupling to the Mesa programming language + Pilot is written in Mesa language + Pilot is the powerful runtime support for Mesa + Pilot has Mesa's OO-like interfaces - Mesa + Separation between interface and implementation -- early OO influence + Strong type checking - Resource management: focus not on fairness but on effectiveness - Defensive protection, not absolute + Claim: errors are more serious than maliciousness + Depends on Mesa type checking *2. Pilot Interfaces* ------------------ - Two types of interfaces: public (for client) & private (for internal use) - Representative public interfaces: file, virtual memory, streams, network communication, etc. - Each interface has named items + Four types of items: type, procedure, constant, error signal + Denoted as interface.item + For example: File.Capability, File.Create, File.maxPagePerFile, File.Unknown (remember, *type* is for high-level robustness, because apps/users has the sense of what it is, the system don't need to interpret/manage it) # 2.1 Files - File + Standard containers for data + Rather low level file system with the expectation of higher level file system on top of it > Flat-file structure (no directory hierarchy - ouch! will be provided by high-level software) > Few attributes: no creation time, owner, etc. (just single user, who care about owner?) + pow(2, 64) files, max file size is pow(2,32) bytes + File.Create( ) - create a file and return a capability + Capability: 64-bit unique id - unique across all pilot machines > Works well with removable volumes (e.g. floppy disk) + Supports *only* Memory-mapped I/O > GOOD? BAD? As an aside, the developers of Pilot retrospect: "Another basic question concerned the kind of access to the file system that Pilot would provide. One alternative was a simple read-write facility with which client programs transfer pages directly between virtual memory and files. The other alternative was a 'mapping' facility, whereby a portion of the file is made the backing store for a portion of virtual memory (for example, as in MULTICS). While this question did not inflame emotions the way the process question did, the proponents of each view felt that the two models were incompatible with each other, particularly those who were trying to convert programs from other systems based on the read-write model. Subsequently, the perception grew that perhaps the two approaches are duals of each other, and that a client program structured for one approach might have a natural counter'part of similar performance and complexity for the other approach. However, we never found a duality transformation to support this view. Finally, we realized that neither model excludes the other. In particular, code files and certain data files of limited size are better supported by the mapping model--the swapping characteristics are understood and address space management in virtual memory is more convenient than explicit reading and writing. Large data files with known, high performance access requirements, on the other hand. are better served by the read-write approach--these files are oRen larger than the virtual address space, and the complexity and overhead of buffer management is worthwhile to achieve the desired performance. Pilot now supports both approaches with consistent interfaces." + Few attributes: type, size, permanence, immutable ==> even don't have owner attribute and file access right, why? because for single user, care about efficiency not security > Size: Adjustable in page-sized units(512 bytes) > Type: 16-bit tag, but not interpreted within Pilot (again, remember this is for high-level robustness) > permanence: temporary files can be reclaimed (e.g scratch storage) > Immutability: Permanently read only file - never changed in any circumstances Make possible to have multiple copies (=replica) with the same file id - Volume: + Represents a media (e.g., magnetic disk) where files are stored + Very similar to partition in DOS, but > single volume - multiple physical media and > multiple volumes - single physical medium are possible + PnP & Automount? > Pilot knows coming and going of physical volumes and make the corresponding logical volume accessible > Of course when a logical volume spans multiple physical volumes, it becomes accessible only when all those physical volumes have come # 2.2 Virtual Memory: Space Interface - Underlying Virtual Memory System + linear virtual memory of up to power(2,32) 16-bit words + *All computations run in the same address space* ==> again, here is one design choice that reflect the single user. there is no VM switch whenever there is a context switch (there is no local page table associated with each process) + each page has 3 flags: > referenced (like pincount) > written (dirty) > write-protected --> this provides limited protection (no read-protected --> hence, and read freely, and it is ok ) (this mean page is the unit of protection, isn't it?) - Hierarchy of spaces: Pilot's Spaces on top of underlying VM + Address space is partitioned into spaces + Spaces is like segments in Unix > multiple pages per space + Space is a (public) interface - Allocation of spaces: by Space.Create + The set of all spaces in Pilot forms a tree by containment + The root space corresponds the whole virtual memory + New space can only be created as a subspace of existing space - Mapping to file pages: by Space.Map -- remember that *only mmap* in Pilot + Map associates a space with a run of pages in a file + A page in VM may only be accessed if exactly one of the nested spaces containing it is mapped + The protection to the file is automatically propagated to the mapped space - Swapping in and out: (again, place where relevant only to single user) ==> *hinted* by Space.Activate, Space.Deactivate, and Space.Kill + The lowest (i.e., the smallest) space of the missing page is swapped in or out + Programmer can use spaces to put pages together whose access patterns are similar + Hints to OS > Space.Activate - the space will be used soon, so please swap it in > Space.Deactivate - the space will not be used for a while > Space.Kill - the contents of the space is no longer of interest, so don't even bother swap it out - *Tight coupling between files and spaces* + Spaces are the only access path to the contents of files, and files are the only backing store for spaces + Good idea? Claim: > decouples buffer allocation from disk scheduling > easy and uniform protection: propagation of file protection to spaces > advice to OS becomes possible: Activate, Deactivate, Kill > can simulate read/write on map (together with appropriate hint), but read/write can't do reverse (good for small file, like code and data file, with limited size? not good for very large file that does not fit in memmory) + Space.ForceOut: blocking sync( ) in Unix Retrospection of Pilot Developers: "The Pilot space is the unit of allocation, mapping, and swapping; spaces can be declared within other spaces, so that the set of all spaces forms a hierarchy according to the containment relation. This was a remarkably simple generalization, but it was *hard to implement and is not used by clients*. Clients have evolved a style in which almost all mapped spaces are subspaces of the primordial space (all of virtual memory) and only a few of these are further partitioned into subspaces for swapping control. The implementation requires such large data structures for each space that they have to be swappable, and only the very active items are cached in real memory. In the end, several caches were needed and a lot of resident code was written to manage them." #2.3 Streams and I/O Devices - Three ways to access I/O devices: implicit, direct, indirect - Direct + Pilot provides procedure calls, which are just (Mesa) wrapper functions of device-specific I/O operations + Reasonable, because Pilot is the OS for PC!, hence no protection, (you can do erroneous thing but not malicious) - Implicit: + Direct access are not allowed for some devices, e.g., disk (why? chaos can happen if both OS and client access disk) + For these devices, pilot provides high level abstraction for I/O: file, etc. - Indirect: Stream + Direct access is very difficult for most devices + Pilot provides Stream interface which is pretty much like Java I/O stream + Transducer converts I/O device interface into a Stream interface + Filter converts a Stream interface into another Stream interface + A transducer and a series of filters form a pipeline with the transducer being attached to I/O device #2.4 Communications - Distinction between "tightly-coupled" processes and "loosely-coupled" processes - Tight - should use shared-memory to communicate (e.g., single-machine parallel program) ==> Mesa supports monitor and condition variable - Loose - should use Pilot communication interfaces (e.g., print server) - All machines in "Pilot internet" are Pilot machines, even routers - Protocol stacks are pretty much like TCP/IP stack + Layer 0 - link layer + Layer 1 - IP layer (connectionless) + Layer 2 - Transport layer (connection oriented) - Interfaces + Socket - Layer 1 interface: support datagram service + NetworkStream - Layer 2 interface: support stream service > a transducer which converts socket device into a Stream > NetworkStream.Create > NetworkStream.Listen - listen in TCP > NetworkStream.Handle - accept in TCP > NetworkStream.Delete - close in TCP # actually no tear-down process # just delete data structures allocated # parties should agree when to terminate communication ==> again, the decision is very flexible for the client #2.5 Mesa Language Support - Mesa language features: recursion, coroutine, concurrent processes, signal, etc. - Procedure call: Pilot only handles traps which occur when space for activation records are exhausted - Coroutine: Pilot gets involved during initialization - Concurrent processes: Pilot creates processes and handles the termination - World-swap - actually two different machine exist at the same time in Pilot: one for normal execution and one for debugging + Two worlds are different in: > contents of memory > version of Pilot > accessible volumes > even microcode (code to run hardware?) + Swap between two worlds > save the context of the current world in a boot-file > resume the execution of the other world by reading the saved context *NOTE*: you can do world-swap because of single user machine. It is not relevant in a time-sharing machine. Normal user does not care about kernel debugging! *3. Implementation* -------------------- - Pilot is composed of components, which in turn are composed of Mesa modules - Hierarchical structure - they called "manager/kernel" structure with high-level manager being on top of kernel # 3.1 Layering of the Storage System Implementation - Storage system is comprised of file-system and VM - Both systems maintains DBs which are too big to fit in memory. So file-system needs VM, and VM needs file-system and VM again. What a knot! ==> Solution? *Layering!* - Kernel: swapper & filer + is small enough to fit in memory + provides basic but powerful enough functions for full-fledged storage system > E.g: handling page fault, swapper call the filer to read on-disk page + handles majority of application's requests - Manager: file manager & VM manager + implements more powerful functions using functions provided by swapper and filer (e.g:file creation and deletion, traverse the hierarchy of subspaces) - Interaction between kernel & manager: + Manager-to-kernel interaction is obvious: manager uses kernel's feature + Kernel-to-manger interaction > Kernel works only with pages and/or special files resident in memory ==> This serves as *cache of managers* > When it can't work with the cache, it punts to manager (no state info is retained) Manager fixes the problem, possibly using another function of kernel, then kernel retries the operation > Indirect recursion or circular dependency, huh? No, because failed operation become the total responsibility of the manager. (i.e, on a failed kernel operation, manager in charge, not using that same kernel function again... hmmm, this may not true, the solution is: make some pages never swapped out ...) # 3.2 Cached Databases of the Virtual Memory Implementation - Hierarchy (of the nested spaces) + is DB to implement space + Maintained by VM manager + Holds records per space: size, base page #, mapping info, etc. + Not fit in memory + Swapper uses resident "space cache" to track swapping info. What if info about a space is not in memory? --> hence the page fault --> Interaction between kernel & manager in 3.1 - Swap unit + the smallest set of pages transferred between primary memory and disk + corresponds to a "leaf" space - Swap unit cache: + Maintained by swapper + contains information about swap units: first page, length, state (mapped or not, swapped in or out, replacement algorithm, etc) + is addressed/indexed by page >page fault for a page, then swap in the swap unit containing the page + Logical cache of Projection - Projection + Another swappable DB of all swap units + Maintained by VM manager + used to update the swap unit cache + speed up page faults which can not be handle by the swapper (so there are 2 level of storing the information about swap unit entry: one is at the swapper level in swap unit cache, other is at VM level in Projection.) + slow down the space creation/deletion since the projection must be updated Problem of "recursive" cache faults: when a manager has a cache miss --> it often incurs a page fault of its own. The handling of that page fault must not incur a second cache fault, or otherwise, the fault episode will never end. ==> Solution? make certain key records in the cache ineligible for replacement Pilot developers retrospect: - the space hierarchy simple, general, and give client a lot of freedom, - but implementation is very complex, need to keep db about space sync between disk and memory ... - but client rarely use it, it tends to have one level (like normal use), hence this is costly, ... #3.3 Process Implementation - Implementation splits roughly equal among Mesa, underlying machine, and Pilot - Basic monitors and condition variables are implemented by Mesa and machine - Pilot's concurrency support is implemented using those monitors and condition variables #3.4 File System Robustness - Each file page has a label--separate record that gives info about the page + Which file it belongs to, etc. - Scavenger periodically scans entire volumes and construct map - Map is just redundant fast-lookup database for volumes + Volume file map: >B-tree keyed on , which return the device address of the page + volume allocation map: store allocation status of pages - Robustness + Pilot works with maps + I/O devices compares map records passed by Pilot against page label + Mismatch of map record and page label indicates the need to scavenge + Any page damage just means the lost of the page. Remaining pages of the file is still OK! (great! like what Vijay working on) + Luxurious options like RAID are not common for PC. Then reliability at the OS level. Great Idea! (don't trust the hardware) - High-level robustness + Scavenger can callback client software for application-specific checking. For example DB integrity check + use Type attribute to determine which files should be processed by client-level scavengers - Volumes of old Pilot can be easily migrated to newer Pilot + New scavenger generates new version of map from old volume structure + New pilot uses the new map generated! #3.5 Communication Implementation Routers - software switches *Evaluations* ------------- - Understand the difference between time-sharing system and PC - Debugger is neat - any error pops you into it auto-magically - Powerful VM - Can give advice to OS - Security - connected to network, but claim is that security is not a big deal. True? - Mesa everywhere, Pilot everywhere - Design is quite different for PC versus minicomputer - what does this say about Linux?