CS 736 Reviews - Spring 2016: The multikernel: a new OS architecture for scalable multicore systems

« Sharing and Protection in a Single Address Space Operating System. | Main | Disco: running commodity operating systems on scalable multiprocessors »

The multikernel: a new OS architecture for scalable multicore systems

Andrew Baumann, et al. The multikernel: a new OS architecture for scalable multicore systems Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, Pages 29-44.

Reviews due Thursday, 1/28 at 9 am.

Posted by Michael Swift on January 27, 2016 08:47 PM | Permalink

Comments

Summary:

The paper proposes a new OS structure, the multi-kernel, that treats the machine as a network of independent cores, assumes no inter-core sharing at the lowest level and moves traditional OS functionality to a distributed system of processes that communicate via message passing. Building an OS with message passing rather than sequentially manipulating shared data structures offers tangible benefits : the ability to pipeline and batch messages encoding remote operations allows single core to achieve greater throughput , reduces interconnect utilization and naturally accomodates heterogeneous hardware.

Problem

As the computer hardware is diversifying , it is becoming increasingly difficult to tune general purpose OS design for a particular hardware model as the deployed hardware varies widely and optimizations become obsolete after a few years when new hardware arrives. These optimizations involve trade-offs specific to cache hierarchy, memory consistency model, relative costs of local and remote cache access and hence not portable across hardware types.

Rethinking this OS structure as a distributed system of functional units communicating via explicit message and viewing states at replicated instead of shared will make the OS structure hardware neutral , scalable and even portable to heterogeneous cores.

Even though current hardware cache coherence protocols between CPUs ensures that OS can continue to assume single shared memory, networking problems (routing and congestion) are issues on large-scale multiprocessors and intra machine interconnects, thus system software should be able to inherently adapt to inter-core topology which will differ between machines and become substantially more important for performance and message passing. The Barrel-fish targets this problem by adopting a system knowledge base which the OS uses to understand which interconnect topology to use.

Contributions

Barrelfish makes the OS structure scalable and hardware neutral by implementing the message passing scheme. Message passing costs less than shared memory: In case of shared memory cost grows linearly with number of threads and number of modified cache line. In case of message passing the client threads pass a light weight remote procedure call to a single server that performs updates on their behalf, hence the cost does not grow linearly. This inherent lack of scalability combined with rate of innovation creates intractable software engineering problems OS kernels. Barrelfish resolves this by proposing the message passing scheme.

Barrelfish divides the OS kernel into CPU drivers and monitors. With CPU drivers being local to core and all inter-core communication being handled by the monitor. CPU driver is still hardware specific and only the monitor which handles the inter-core communication is processor agnostic. The monitors maintain replicated data structures which are globally consistent by means of an agreement protocol. Barrel-fish uses a variant of URPC for inter-core communication. For memory management they use system calls that manipulate capabilities which are user-level references . This system removes dynamic memory allocation from CPU drivers. This decentralization of memory management is done to enhance the scalability, however, there is no real benefit as all cores must still keep their capability lists consistent.

The idea of System Knowledge Base I feel is the biggest contribution of this paper. The idea of maintaining knowledge of underlying hardware and using it to allocate device drivers to cores in a topology aware manner , NUMA aware memory allocation is something which is essential or present and future system because of increasing complexity and dynamic nature of things.

Evaluations

They take the most latency critical case of a TLB shootdown ( critical because it is part of the CPU pipeline and hence directly impacts processor performance) and analyze it for Barrelfish and compare it with the IPI mechanism used by Linux and Windows.

Barrelfish uses a message where local monitor broadcasts invalidate messages and waits for replies. We expect this to be slower than IPI however , it is not due to the SKB policy of barrel fish it is able to use a highly efficient topology. However, in my opinion this performance is not justified as it just does not depend on message passing time, the reciever should also accept the message immediately like in case of IPIs. Here, even though the message is travels fast monitor might handle it only when "convenient".

The paper compares the different message passing protocol, namely, broadcast protocol ( requests every other core) , Unicast protocol (individual request) , Multicast protocol (just forwards to first core of processor ) and NUMA-Aware multicast (SKB is used to send request to highest latency first). We see that this communication mechanism outperforms Linux 2.6.26 and Windows Server 2008 R2 Beta. The message passing mechanism is surely promising as it provides a lot flexibilty (to optimize and use different communication mechanisms) and scalability with changing hardware designs. It will also be of great value for heterogenous systems as the communication now has become independent of the other OS careabouts due to the division of work between CPU Drivers and monitors.

Confusions

what is the most widely used structure for multicore systems? In a multikernel approach if we have independent OS (like CPU drivers here) ,say if a new process wants to start executing which OS schedules it ? Master-slave system ? Is it controlled by the monitor?

The paper implements message passing using an area of shared memory as a channel. How would it happen in a real scenerio?
Does this invalidate there evaluation ?

What are the drawbacks of this approach?

Why has the process structure been changed?

The "agreement protocol" used by monitor for keeping globally consistent state will be farely complex (compared to shared memory) to implement in my opinion. This consistent state is very important for multi-kernels. Opinions on this?

Posted by: Vishakha Dhelia | January 28, 2016 08:57 AM

1. Summary
Computers are scaling up in terms of number of processors, cores and are diversifying in terms of types of cores, memory hierarchy, ISA, interconnect topology etc. Therefore, it is becoming tougher to optimize a general monolithic OS to run efficiently on such large-scale diverse systems. The paper proposes embracing ideas from distributed systems when building OS and presents the Multikernel, which treats the system as independent cores, with lightweight OS kernels running on each core optimized to the underlying hardware of each core and communicating with the rest of the multikernel, like a distributed system, over a network using message passing.

2. Problem
Modern-day systems are becoming increasingly diverse - different systems use very different memory hierarchies, interconnection networks, numbers of cores, types of cores, and may even employ a diversity of cores/compute-units within their own system. General purpose OS-designs, can only be designed to suit the common hardware case - becoming less efficient atop diverse systems. There is a need for the operating system to be able to suit hardware diversity but at the same time be kept general purpose at higher levels.

Moreover, message passing mechanisms are gaining more traction in multi-core systems as well - scalability bottlenecks in shared memory mechanisms (coherence) present a possible future wherein, shared memory is kept minimal, cores are non-coherent and all communication is restricted to message passing alone.

The authors attempt to solve these issues by presenting an OS architecture for heterogenous multi-core systems called the multikernel model. The OS is structured as a distributed system, optimized for and running on each core, communicating via messages and sharing no memory.

3. Contributions
The authors present the multikernel OS model, wherein inter-core communication is explicitly, OS structure is hardware neutral and state is replicated across cores rather than being shared.

Inter-core communication is performed via explicit message passing. Explicit communication allows optimizing the OS communication system to be suited to the network topology of the system it is being implemented on. The communication is decoupled from the scheduling of threads on cores (which is optimized to be hardware specific) and is simply optimized with only the network topology in mind. Advantage include modularity, isolation, split-phase execution etc.

Making the OS structure hardware neutral is achieved by decoupling the distributed communication mechanism of the OS from its kernel implementation on each core. A system with diverse cores will have OS kernels, optimized on a per core basis, running on the core, while the communication within the distributed OS as a whole is oblivious to the hardware implementation. This provides two fold benefit - maximum optimization towards each core and maximum optimization towards the interconnect network.

The paper also introduces a new structure of a process that in the multikernel - it is composed of dispatcher objects (one for each process), scheduled by the CPU driver on the core, ad each dispatch schedules the process’ threads.

The model is implemented in hardware as Barrelfish, a prototype to run on AMD and Intel systems.

4. Evaluation
Barrelfish is evaluated on AMD and Intel systems to test for baseline performance, scalability and adaptability to different hardware. The authors perform a case study on TLB shootdown, comparing the latency of shootdown across multiple message passing protocols as well as with Windows and Unix. With increasing number of cores, Barrelfish performs better due to more optimum message-passing. Compute-bound workloads (with large shared address spaces) show similar performance on Barrelfish and Linux, while IO-bound perform better on Barrelfish.

5. Confusion
The paper discusses changing the process abstraction - introducing dispatch objects, but it is unclear as to why this is key to their implementation. Is it only related to ease of communications between threads of the same address space on different cores?

The authors show that compute workloads have same performance on Barrelfish as on Linux - but was there any chance of the performance being different? Since applications still share memory through coherence in both cases, and influence of OS is less on these benchmarks due to low system calls (compute oriented benchmarks) and the OS function is mostly restricted to naive scheduling.

When inter-core communication is required, the thread on one core sends marshaled data across the network. For this to happen, is it necessary for the monitor process to be switched in? If so, inter-core communication would also involve the same context switching as inter-process (intra-core) communication would (which switches to the CPU driver and then to the next process), so why is inter-core cheaper?

Posted by: Gokul Subramanian Ravi | January 28, 2016 08:39 AM

1. Summary
The paper proposes to rethink the structure of OS as a distributed system for future multicore systems to scale with the increasing number of cores while dealing with the heterogeneity among the cores. It introduces a new structure for the OS - the multikernel model which views a machine as multiple independent cores communicating via message passing and assumes no inter-core sharing.

2. Problem
Systems are getting increasingly diverse in terms of the different underlying hardware, diversity among the cores in a single machine in terms of instruction set architecture, performance characteristics etc and in the topology of the interconnects between these cores. Optimizing the OS data structures to work for a common hardware may not be optimal as it might fail to exploit the features or opportunities for optimization provided by individual hardware. A monolithic kernel assuming a single shared memory and using locks to synchronize access to shared data structures and cache coherency protocols between CPUs don’t scale well with the increasing number of cores.

3. Contributions
The main contribution of the paper is the idea of multikernel model - which rethinks an OS for the heterogenous network of cores within a system as a distributed system of cores that assumes no shared memory and communicate via message passing. There are three key principles that guide the design of a multikernel model. i) Using explicit message passing for all inter-core communication instead of shared memory naturally supports heterogeneous cores, decouples request and response (so that the core issuing the request is not blocked and can do useful work) and is naturally modular. ii) By separation of the structure of OS from the hardware and by only having message transport mechanisms, CPU and device drivers as the hardware dependent aspects of the OS, adapting the OS to changing performance characteristics of hardware becomes easier, late-binding of protocol implementation and message transport is enabled. iii) Having replicated data structures like PCBs on each core is more scalable as the load on system interconnects reduces, naturally supports domains that do not have shared memory and also support the changes to the running set of cores. The second main contribution is the application of the multikernel model in an OS - Barrelfish that can run on x86-64 based processors. Each independent OS on each core has a privileged mode CPU driver that serially handles traps from user process and interrupts from devices. All inter-core coordination is performed by monitors which maintains consistency of replicated data structures globally and inter-core communication uses a variant of User level RPC.

4. Evaluation
The paper motivates the use of message passing architecture with a microbenchmark comparing shared memory and message passing for scalable updation of shared state by multiple cores. They show how multikernel scales with number of cores compared to other OS for latency sensitive unmap operation. The systems is compared with Linux for compute intensive as well as CPU intensive workloads. These results show that Barrelfish scale well for homogenous cores in-par or better than existing monolithic OS’es but doesn’t have an evaluation for a how well the system perform with heterogeneous cores.

5. Confusion
How will abstractions like file systems be implemented on such an architecture of multi-kernel? I couldn’t quite understand the notion of scheduler activations mentioned in a number of places.

Posted by: Aishwarya Ganesan | January 28, 2016 08:15 AM

Summary
-------------
The authors in this paper propose a “multikernel” structure for an Operating System. In this structure the system views the machine as a network of independent cores. The cores act as a separate processing unit with their own replicated state and communicate via explicit message passing. This is analogous to a distributed system and make the OS adaptable and scalable to the evolving heterogeneous hardware. The authors also evaluate their system Barrelfish, implemented on the above lines across workloads and obtain better or comparable performance to conventional OSs with wider support for hardware heterogeneity, increases modularity and ability to reuse communication algorithms of distributed systems.

Problem
------------
-Heterogeneity in hardware in terms of, instruction set architecture, cores in a machine or between systems was changing faster than the overlying software.
-The authors motivated by the above problem of scalability set out to rethink the structural organization of an Operating System with a goal to decouple the point-to-point solutions for underlying hardware prevalent in many Operating Systems as these were forced to change as new hardware emerged.
-As a result the idea of a "multi kernel” system was proposed where the OS was hardware-neutral, communication via message passing instead of the conventional shared data an the OS comprised of a distributed system of functional units.

Contributions
-------------------
The multi kernel model contributed mainly in the following three aspects:
[a] Explicit inter-core communication
-no memory was shared between code running on each core, which communicated via message channels.
-This allows the OS to leverage the availability of well-know network optimizations such as pipelining and batching.
-It also enables OS to isolate and manage resources on heterogenous cores, schedule jobs in the inter-core topology and allow split-phase communication.
-Components are modular in nature giving rise to easy evolvement, refinement and robust to faults.

[b] OS structure was made hardware-neutral
-by having only two aspects to the OS namely message transport mechanisms(handled in monitors running in user space) and interface to hardware(contained in CPU drivers residing in kernel space).
-this enabled to isolate the distributed communication algorithms from hardware implementation details.
-Late binding of protocol implementation and message transports is possible.

[c] Each state was viewed as replicated
-each functional unit in a multi kernel had a replicated state of the shared data.
-these were accessed and updated viewing them as a local replica
-consistency was ensured by exchanging messages.
-this helped preserve OS structure and algorithms even when underlying hardware evolved and could also handle hot plugging processors or shutting down hardware.
-spinlocks could aid in further optimization.

Barrelfish implements the above contributions by modularizing the OS into
-CPU drivers that run in kernel space and are local to a core, invoked via system calls, specialized for the x86-64 architecture, enforces protections, performs authorization, time-slices processes, mediates access to core and associated hardware.
-Monitors that run in user space are single core and schedulable, collectively co-ordinate system wide state, maintain global consistency of replicated data structures like memory allocation tables and address space mappings via agreement protocols. They also are responsible for interprocess communication setup and waking up blocked processes or making a core idle when no process is running on it.
-Collection of dispatcher objects residing on each core on which it might execute, communication is between dispatchers and not cores.
-System Knowledge Base(SKB) - maintains knowledge of underlying hardware, latency, bandwidth, internconnect topology etc.

Evaluation
--------------
The authors evaluate their system Barrelfish implemented based on the proposed “multi kernel” approach for performance and scalability by using various compute and IO workloads.
[a] TLB shootdown is used to measure the communication and coordination in the multiprocessor OS. Barrelfish based on message passing outperforms IPI based Windows and Linux. In practice Barrelfish is slower owing to the fact that it is first of its kind there exists large overheads due to the absence of possible optimizations.
[b] Barrelfish based on message passing also demonstrates consuming fewer cycles than Windows and Linux when it comes to changing memory ownership in the case of “two phase commit”.
[c] It also achieves better throughput, reduced cache misses, avoids kernel crossings during IP loopback compared to Linux.
[d] Compute bound workloads, OpenMP and SPLASH-2 benchmarks do not scale very well on either Barrelfish or Linux, however Barrelfish in this case demonstrates its capability to support large, shared address space parallel code with little performance penalty owing to the user-space threads in its design.
[e] A realistic I/O scenario of serving static and dynamic web content from a relational database is evaluated using Barrelfish. It performs better than Linux owing to the facts, web server is run as separate processes, communication is via URPC, execution entirely in user space and kernel-level crossings are avoided.
The above results coupled with the ease with which a web server, network stack, drivers, and libraries could be ported to Barrelfish and the scope for further optimization emphasize Barrelfish, as an OS design that can be touted to be a feasible alternative to the existing monolithic systems.

Confusion
--------------

1. Is the monitor responsible for load balancing or scheduling the processes across cores? How is this achieved?
2. Scheduling and granularity of dispatcher, dispatcher objects and their mapping with monitor and CPU drivers is not quite clear.

Posted by: Shruthi Racha | January 28, 2016 08:12 AM

Summary
To better support the increasingly heterogeneous nature of multi-processor systems, the paper proposes a new operating system architecture in which the machine is treated as a distributed system composed of multiple single-core machines networked together. Each core hosts a single OS instance consisting of hardware-agnostic message passing code layered on top of a architecture specific "CPU driver."

Problem
The performance characteristics of individual cores in modern multiprocessor systems are increasingly non-uniform; contemporary multi-socket systems have replaced shared busses with networked hardware interconnects that possess varying topologies, and the internal layout of future multicore CPUs will similarly become more network-like. Contemporary OS architectures, however, still attempt to present the computer as a uniform set of cores on top of a shared memory. This simplistic view of hardware yields sub-optimal performance in certain use cases, however. For example, frequent use of a kernel data structure may lead to heavy contention for certain cache lines, inducing pipeline stalls in multiple cores as they each attempt to update the structure. Moreover, the design concerns needed to support new forms of multiprocessing cut across conventional kernels, leading to difficult code changes, as evidenced by the challenge of removing coarse-grained kernel locks in both Windows and Linux.
Contributions
The authors present the "Multikernel" architecture, in which a single instance of the OS runs on each core, and these instances communicate via a mesage-passing protocol. Rather than treating the kernel's state as being shared across cores, as in a conventional kernel, each core instead maintains a local replica of global OS structures, which it maintains by communicating with the other OS instances. By making this message passing explicit, the authors improve performance by reducing the load on processor interconnects, and by allowing each processor to asynchronously update its local state, instead of stalling on cache misses. Moreover, by decoupling the OS's memory coherence model from that of the hardware, developers can tune their coherence protocol to fit the workload, and can uniformly support novel processors such as general purpose GPUs.
The authors present "Barrelfish," a specific implementation of the architecture. In Barrelfish, each OS instance has hardware-specific driver code, which runs with kernel priveleges, and functions as an exokernel, multiplexing CPU time and handling the hardware details of message-passing. On top of this driver, the authors construct a user-mode monitor which handles message passing and mainting the local replica of OS state.
Evaluation
The authors examine several TLB shootdown protocols, and show that when the protocol is correctly tuned for the hardware Barrelfish is running on, the protocol scales well as the number of processors increase. Using the tuned protocol, they then compare Barrelfish's performance in unmapping memory against Windows and Linux, and show that Barellfish's performance scales markedly better as the number of cores increase.
They evaluate Barrelfish's performance on compute intensive workloads against Linux, and show that performance is largely similar. They also compare the performance of Barrelfish and Linux for a web server and relational database, and Barrelfish outperforms Linux, since the the majority Barrelfish's functionality executes in userspace, reducing the number of kernel traps.
While these results are comparing a tuned version of Barrelfish against standard versions of Windows and Linux, making nuanced and certain conclusions difficult, these results indicate Barrelfish has the potential to perform comparably well to conventional OSes in standard workloads, while performing beter as core counts increase.
Confusion
I'm still unsure how Barrelfish's process model works, particularly with respect to the upcall interface and the time slicing performed by the CPU driver. Given the fact that user libraries are supposed to provide more sophisticated scheduling policies, it sounds likes something akin to cooperative multi-tasking, but that's not clear to me.

Posted by: Michael Vaughn | January 28, 2016 07:46 AM

1. Summary
The paper details the multikernel model that structures the OS as a distributed system of cores that communicate using messages and share no memory.
2. Problem
In order to address the rapidly growing number of cores which leads to scalability challenge and increasing diversity in computer hardware, the multi kernel operating system was developed since a single general purpose OS poses serious challenges in design optimisations and implementation.
3. Contribution
The paper details the key features of the multikernel model. Inter-core communication is performed using explicit messages and no memory is shared between the code running on each core. This allows for efficient use of the interconnect and enables the OS to provide isolation and resource management on heterogeneous cores.
The OS structure is made hardware neutral enabling scalability in diverse systems and late binding of both protocol implementation and message transport. The required OS state is replicated and consistency is maintained by exchanging messages.
An implementation of the multi kernel model, Barrelfish, is also described. The OS instance is factored into privileged-mode CPU driver and a distinguished user-mode monitor process. A process is represented by a collection of dispatcher objects, one on each core on which it executes. Inter-core communication takes place through URPC where a shared memory region is used to transfer cache-line-sized messages. For memory management, the capability system model is used. The System Knowledge Based maintaining knowledge of the underlying hardware in a first-order logic allowing for query optimisation.
4. Evaluation
Several evaluation settings were used to test whether the goals of multi kernel have been met.
The TLB shootdown is a simplest latency-critical operation that requires global coordination. The message based unmap operations are Broadcast, Unicast, Multicast and NUMA-Aware Multicast. These message-based unmap operation in Barrelfish outperforms the equivalent IPI-based mechanisms.
Though the two phase commit protocol serialises more messages than TLB shootdown, Barrelfish achieves good scaling and performance using multicast technique. IP loopback is used as a stress-test for the messaging, buffering and networking. Barrelfish achieves higher throughput, fewer caches misses and lower interconnect utilisation because sending packets as URPC messages avoids any shared-memory.
Workloads to measure shared memory, threads and scheduling were tested and it demonstrated that Barrelfish can support large, shared-address space with little performance penalty. The network throughput achieved was comparable to the Linux machines.
Performance evaluations demonstrate that Barrelfish performs reasonably on contemporary hardware and has a great scalability on homogenous cores.
5. Confusion
1. Dispatcher Model

Posted by: Nivetha Singara Vadivelu | January 28, 2016 07:35 AM

1. Summary
This paper discusses a new operating system architecture - "Multikernel", for heterogeneous multi-core systems that treats system as network of cores and incorporates message passing for inter core communication making OS structure hardware neutral. A prototype of the design Barrelfish was implemented and its performance is evaluated. The objective is to scale the OS for rapidly evolving hardware models as it is not possible to tune the general purpose OS design for every models and perform the relevant optimizations.

2. Problem
The traditional monolithic kernels with increasing system interconnect diversity and core heterogeneity it is no longer acceptable to tune the OS for each hardware system configurations for efficiency. Moreover as the number of cores increase cache coherency techniques become challenging and updating shared memory requires kernel support and has scalability issues. The argument produced in the paper is that shared memory model is one of the most limiting factors in achieving scalalabiity in future hardwares.

3. Contributions
The authors propose the system to be visualized as distributed system of cores that communicate through messages and do not share memory. With this model knowledge of distributed systems could be applied to solve hardware challenges. The key contributions from multi kernel model is inter core communication is in terms of messages, instead of having a single shared memory giving advantage to process messages asynchronously. By separating all of the operating system from the hardware it abstracts heterogeneous hardwares. Addtionally kernel states is maintained in the core and consistency is maintained through messages.
Barrelfish a prototype was implemented on the proposed design and was evaluated w.r.t. current operating system and it achieves a comparable performance. Abstractions introduced in Barrelfish are CPU drivers which are light weight localized abstractions for machine cores and monitors hardware agnostic user mode processes for inter core coordination.

4. Evaluations
To evaluate Barrelfish a number of experiments for different message passing protocols for TLB consistency such as broadcast, multi cast were conducted and visualize the scalability achieved with increasing number of cores. Different compute and IO intensive workloads are run, showing that Barrelfish performs as good as Linux. The authors also evaluated latency of memory unmap operations which remains constant with increasing cores. With these evaluation it is shown that Barrelfish performs reasonably well on current hardwares and have far reaching benifits for future hardwares.

5. Confusion
The role of dispatcher for inter process communication in multi kernel architecture is not clear.

Posted by: Ankur Srivastava | January 28, 2016 06:55 AM

1. Summary
The Multikernel idea was motivated by the fast evolving diversity in hardware and a need to develop an operating system that would be neutral to the hardware architecture such that it would be scalable to multi-core heterogenous hardware. Operating system was approached as a distributed system of processes wherein the machine is treated as a network of independent cores that use message passing as the IPC protocol instead of shared memory.

2. Problem
The traditional OS needed to adapt to the increasingly complex optimizations involving cache hierarchy, the memory consistency model and relative costs of local and remote cache access, of hardware changes in order to make efficient use of modern hardware. Systems were getting diverse and the basic structure of a shared memory kernel had correctness and performance issues.

3. Contributions
The authors believed in three main design principles: to make all inter-core communications explicit with no shared memory between the cores except for message passing, making the OS structure hardware-neutral by introducing the concept of inter-core message passing, CPU drivers and Monitors on every core and, viewing state as replicated instead of shared thereby reducing access latencies by keeping a copy of allocation tables and address space mappings globally consistent. The CPU drivers are single-threaded and non-preemptible. They register interrupts from other cores on their respective Monitors and schedules different user-space dispatchers on the local core. It has secure access to the core hardware, MMU, APIC etc. The Monitor is responsible for inter-core communication channels and waking up blocked local processes and idle the code to save power when there are no processes running. The authors give a well defined idea on handling common/shared resources such as main memory, I/O etc.

4. Evaluation
The evaluations done on the Barrelfish OS are quite comprehensive. TLB shootdown use various messaging protocols and it is clearly shown that the NUMA-aware multicast works well for maintaining TLB consistency across all the cores. It uses IP Loopback tests to support the claim that User-RPC achieves higher throughput, fewer cache misses and lower interconnect utilization when compared to linux that causes more cache-coherence traffic for shared memory. In IPC, the two round trips when the sender writes to invalidate the line in receiver's cache, while receiver to fetch the line from sender's cache is not efficient implementation. Instead, a guard bit could be set in the shared memory while the message is being written and then unset it when it is ready to be received by the receiver core. The lack of evaluation for the heterogenous core doesn't back the claim that this design would work efficiently across cores whose instruction sets are varied.

5. Confusion
How is the user-level tasks distribution done keeping the individual core capabilities in mind, does there reside a global scheduler that can leverage the knowledge of all the cores in order to assign capabilities to the dispatcher objects for the most appropriate core.

Posted by: Sejal Chauhan | January 28, 2016 06:44 AM

1. Summary
Change in today’s computer hardware trends having more cores in increasingly diverse architecture. The traditional monolithic Operating Systems using a shared-memory model face the challenge of the dynamicity of workloads and the difficulty of optimizations targeting different hardware. The paper proposes multi-kernel, with one possible implementation called Barrelfish, which handles the complexity by treating each core as an independent piece of hardware without inter-core sharing, thereby managing a multicore system as a distributed system of processes communicating using message-passing.
2. Problem
The author believes that a traditional monolithic Operating System using the shared memory model does not fit well in the current context of multi-core computer system, because:

1. The underlying hardware of the OS becomes increasingly diverse and heterogeneous, with each processor having different trade-offs in performance, thus it is impossible to tune a single-kernel operating system to be optimized for all different processors respectively.
2. Even if the processors in the computer system are unified, having the operating system to deal with optimization for both individual hardware performance and coordination of sharing data between them makes the code structure very complex.
3. The share-memory model does not scale well in multi-core system and may not even work in some cases: hardware cache-coherence protocols on CPU scale linearly with the number of cache lines, and various programmable peripheral
Devices like GPU do not support cache-coherence with CPU.
3. Contributions
The paper provides a solution to the problems described above, which is a multi-kernel system that should be based on design principles:
1. All inter-core communication are made through explicit message-passing, replacing the implicit communication such as shared memory which burdens the hardware for cache coherence. And sharing memory is reduced to its minimum, only involving the messaging channel between cores. This principle enables optimizations such as those used in network, and allows asynchronous update, which allows the requesting thread to continue to do useful things while the request is being processed. This may solve a performance problem in shared memory model, where cores have to wait on hardware-level cache update, wasting cycles linear to the number of cache lines.
2. The OS structure should be made hardware-neutral. The only two aspects of the OS that should be hardware specific are the messaging transport mechanisms, and the interface to hardware (CPUs and devices). In Barrelfish each OS instance is factored into a CPU driver, which handles the hardware specific tasks, and a user-mode monitor, which handles the complex inter-communication mechanism.
3. The global OS state across cores should be replicated instead of shared. Consistency is kept by sending messages, which benefits from the various optimizations implemented for the messaging mechanism. This method reduces the cost on system inter-connection.
4. Evaluation
The performance of Barrelfish is tested against Linux on various settings of multi-core systems. The test includes TLB Shootdown, Messaging Performance, Compute-bound and IO-bound workloads to test the efficiency of the messaging mechanism and scalability of the system. The result shows that Barrelfish performs no worse than Linux, and better in many cases. Especially Barrelfish exhibits potential of greater advantage as the number of cores grow.
5. Confusion
In section 2.4, how does messaging RPC call work to maintain cache coherency?

Posted by: Fujie Zhan | January 28, 2016 05:31 AM

1. Summary
The paper introduces the multikernel operating system which views the underlying multicore architecture as a distributed system of processors which communicate through the message passing model. Due to this the multikernel scales well for a higher number of homogeneous and heterogeneous cores. The authors built barrelfish as a prototype to demonstrate the viability of such a design from the engineering and performance perspective
2. Problem
The vast array and rapid change in computer hardware poses serious challenges on the operating system with regards to both scalability and optimization. The current shared memory model also does not scale well for multi and heterogeneous core configuration. Operating systems are forced to adopt increasingly complex optimizations which are not portable or future proof. This requires a fundamental shift in how underlying hardware in modelled by the operating system in one unifying shared and consistent image.
3. Contributions
The authors rethought the operating system model to be hardware neutral using concepts from distributed systems such as message passing and state replication instead of shared data.
Each core has an individual instance of a kernel running on it called the CPU driver and a corresponding user process called the monitor. All application interact with the monitor for accessing the hardware resources and communicate using Remote Procedure Calls rather than system calls needing expensive context switches. The system also has a System Knowledge Base (SKB) containing information regarding the underlying hardware characteristics allowing dynamic optimization of workload and drivers across the various cores. In this way the authors apply concepts from various different research areas such as micro kernels and distributed systems in an innovative way to solve a different problem.
4. Evaluation
They implemented Barrelfish as a prototype multikernel based on the ideology of no memory sharing. Various micro benchmarks such as TLB shootdown as well as memory unmap were used to show how a message passing paradigm scaled for a large number of cores compared to the traditional shared memory paradigm. Real world application tests were used to establish that such a proof of concept operating system could perform reasonably well on commodity homogeneous hardware against optimized and mature shared memory operating systems. Complex use cases were not benchmarked and these may have uncovered new challenges. For example the tests used only a few device drivers (NICs etc) that benefitted greatly by being co located on one core. The message passing algorithms would be more thoroughly tested by running a benchmark that consumed many hardware resources simultaneously and needed large volumes of data to be moved between various cores such as a render farm.
5. Confusion
The paper does not present the virtual memory interface in sufficient details especially for threads spread across various cores.
They do not elaborate on how the virtual memory is split among various cores especially when some may have different architectures(little endian vs big endian).
The scheduling policy is not clear to me with the CPU driver responsible for time slicing and dispatching but the dispatcher also running a user level thread scheduler.

Posted by: Abhinav Mehra | January 28, 2016 03:14 AM

1. Summary
The paper proposes a new OS structure for heterogeneous multicore chips which treats it as a distributed system. The authors claim the design to be scalable, hardware agnostic and delivers performance comparable to mature monolithic kernels.

2. Problem
Current OS structure is poorly suited to manage the heterogeneous systems, which is expected to have diverse cores and also more number of cores in the future. The paper discusses a multikernel OS structure, where each kernel can be modified independently to address the differences in CPUs, interconnect topology, etc. Even though current OSes try to address this issue, the shared memory model would fail to scale as the number of cores increases. Also the solution to the above mentioned problem should be hardware agnostic for wide deployment.

3. Contributions

The paper proposes a new OS structure called multikernel model, which treats the multicore chip as a distributed system. The cores communicate using message passing technique and do not share any memory. This model is hardware agnostic and can be tweaked to get the best out of diverse cores.

The authors have also implemented the proposed model, called barrelfish and evaluated this design against current mature OS kernels. The kernel is split into two : (i) a privilaged-mode CPU driver responsible for protection, time-slicing process execution and mediating access to the underlying hardware. (ii) a user-mode monitor process, which takes care of the inter-core communication and co-ordinate preserving the OS state across different kernels.

4. Evaluations
The authors evaluate Barrelfish using various micro benchmarks. TLB shootdown case study describes how different inter-core URPC communication protocols affect performance. It can be seen that protocols which are aware of the underlying hardware and the network topologies (Numa - aware multicast protocol) tend to scale better than broadcast ones. To stress-test the implementation, IP loopback benchmark is used, which requires kernel interaction and shared memory synchronization. Multikernel implementation performs better than Linux OS in this case. For other compute bound benchmarks like openMP integer sort, Barrelfish scales sames as traditional linux. The paper also evaluates the design for IO workloads. The authors have left out heterogeneity analysis which is a possible future work.

5. Confusions

The authors do not clearly state the role of dispatcher unit. When monitor process is overlooking the inter-core communication why do we need another unit for thread synchronization across various cores?

From the compute bound workloads, it can be seen that the design does not scale significantly better than existing designs. Would it still make sense to go for such enormous changes in the fundamental design?

Posted by: Bhardwaj Krishnamurthy | January 28, 2016 03:09 AM

The multikernel OS: A new OS architecture for scalable multicore systems

3. Contributions

The paper proposes a new OS structure called multikernel model, which treats the multicore chip as a distributed system. The cores communicate using message passing technique and do not share any memory. This model is hardware agnostic and can be tweaked to get the best out of diverse cores.

The authors have also implemented the proposed model, called barrelfish and evaluated this design against current mature OS kernels. The kernel is split into two : (i) a privilaged-mode CPU driver responsible for protection, time-slicing process execution and mediating access to the underlying hardware. (ii) a user-mode monitor process, which takes care of the inter-core communication and co-ordinate preserving the OS state across different kernels.

5. Confusions

The authors do not clearly state the role of dispatcher unit. When monitor process is overlooking the inter-core communication why do we need another unit for thread synchronization across various cores?

From the compute bound workloads, it can be seen that the design does not scale significantly better than existing designs. Would it still make sense to go for such enormous changes in the fundamental design?

Posted by: Bhardwaj Krishnamurthy | January 28, 2016 03:07 AM

Summary

In the paper, a new multikernel OS architecture has been proposed to solve the issue of increase in diversity and number of processor cores with varying performance characteristics, which considers the entire system as a distributed system of cores that communicate via message passing. As a proof of concept, a multikernel OS known as Barrelfish has been designed and evaluated on multicore systems.

Problem

As computers are getting more and diverse set of processors with multiple cores and varying performance characteristics, the current trend of building general purpose commodity operating systems and tuning it to improve performance on each hardware configuration would become inefficient and hence considering the collection of cores as a distributed system can help scale up/down dynamically using state of the art concepts from distributed systems/networking. As more and more cores get added to the system, the number of cycles for cache coherence increases linearly in the current model and hence the multikernel OS design uses explicit message passing as one of its core principles.

Contributions

The main contribution of the paper is an OS architecture for multicore machines that considers the system as a distributed system of networked cores that communicate explicitly through message passing and is hardware neutral. It emphasis on using message passing instead of the existing cache coherence protocol for replication and synchronization.

The paper introduces a multikernel model called barrelfish that follows the design principles of replication of state, explicit message passing and hardware neutrality . It introduces the main components of the barrelfish OS. The privileged mode OS driver manages the scheduling of its dispatcher objects and interprocess communication. User mode Monitors coordinate system wide state, replication and coherence of memory allocation tables and other shared data structures across cores, but is also a user space process which gets scheduled.

It also explains how memory management and address space is managed by replicating hardware page tables and about the maintenance of consistency through message passing in different scenarios.

Evaluation
The prototype barrelfish has been evaluated to convey that it meets the design goals.
The first case study that has been performed is the TLB shootdown evaluation that helps prove that the topology aware NUMA-aware multicast protocol scales well.

In order to evaluate its messaging performance, two phase commit which is performed for ensuring consistency while retyping capabilities was evaluated and IP Loopback performance was evaluated to show that barrelfish had better throughput and also fewer cache misses.

It's performance was also measured in the case of CPU bound and I/O bound workloads to show that it can perform comparably well.

Doubts

I feel that the author does not make a strong case for moving the allocation and manipulation of page tables to the user space.

What about fault tolerance and load balancing among the cores? Why does the author not mention about them? Not sure if I’m missing something!

Posted by: Siddharth Suresh | January 28, 2016 03:07 AM

summary~
The increasing processor core counts and hardware diversity presents new challenges like optimization and scalability on current OS. This paper presents a new type of OS structure called multikernel that try to cope with these challenges by incorporate the ideas in distributed systems like message passing into the design of the OS structure. The authors also build a prototype - Barrelfish to demonstrate their ideas, and evaluated its performance under various workloads. The results showed that the multikernel structure performs well on contemporary hardware and scale up well and adapt well with the increasing core counts and new hardware.

problems~
The emerging of new hardwares and new architectures makes it hard to do general optimizations on a single OS image in the way that the OS can adapt and scale well with the unique performance characteristic of different hardwares in a heterogeneous system. Even with a homogeneous system that consists only CPUs, the shared-memory kernel cannot scale up well with the increasing core count due to the overhead of memory coherency protocol.

contributions~
In this paper the authors introduced the idea that these emerging heterogeneous system with new hardwares and more core counts can be viewed as distributed systems in the way that each hardwares in the system are like nodes in the network and communicate via message-passing. This viewing enables the OS designer to apply the insights from distributed systems to solve the scalability and adaptivity problems. Specifically, the authors attempt to make inter-core communication explicit via message-passing, make OS structure hardware-neutral to adapt with heterogenous system and replace sharing with replica to improve performance.

evaluation~
The prototype was built and was run on several platforms with different hardware configuration. They performed a case study of TLB shootdown latency to compare the performance of message based communication with other various protocol. The results shows that mechanisms outperformed the IPI-based mechanisms in Linux 2.6.26 and Windows Server 2008 R2 Beta, Enterprise Edition.

confusion~
What is the minimum set of primitives that an instance must provides and what should be left unspecified so that it can be tweaked later on for different hardwares?

Posted by: Yudong Sun | January 28, 2016 02:58 AM

1. Summary
This paper proposes a new way to organize the OS on multicore systems, where each core runs its own copy of the OS independently and communicates via message passing just like in distributed systems.

2. Problem
The rapid evolving of multicore hardwares requires complex changes to tradition OSs. Different architectures with heterogeneous cores can only be fully exploited by OSs specialized for them. In addition, communicating via shared memory managed by hardware is expensive.

3. Contributions
The multikernel model follows three design principles. 1) Cores should be independent and their communication is explicitly managed by software. 2) The structure of the OS is uniform over all kinds of node layouts on hardware. 3) Prefer replication over shared memory. These principles give the kernel more control on how cores coordinate with each other, which can make it more efficient and involves less code change.
This paper also shows a prototype system Barrelfish under the multikernel model. Only little abstractions as in exokernel are done in kernel-mode code called CPU driver, and all other coordination works are handled by user-mode code called monitors. Global resources are managed by capabilities and communication through shared memory is still supported for upper-layer applications.

4. Evaluation
The authors did three categories of experiments. Firstly, scalability and hardware topology awareness are shown by implementing TLB shootdown with 4 different protocol. They then measured the messaging performance by comparing with Linux in IP loopback. The last kind of experiments include compute-intensive workloads and IO-intensive workloads, showing that the multikernel architecture performs reasonable compared to classic OSs as well as provides similar programming interfaces. However, heterogeneous cores are not yet supported.

5. Confusion
This paper only discusses how inter-core communication is implemented using cache-coherent memory on their target hardware. But how can this generalize to all other architectures, especially when heterogeneous cores are involved? The routing optimization in the system knowledge base may not cover all topology. These are actually essential to the messaging performance.

Posted by: Xiangjin Wu | January 28, 2016 02:57 AM

1.Summary:
This paper describes the design of a new operating system - 'Multikernel', for multicore systems, viewed as a distributed system of cores, employing message passing model for better scalability and also providing hardware neutrality, as opposed to traditional OS. The authors have built a prototype of the system called Barrelfish and compared its performance with Monolithic kernels.

2.Problem:
The disadvantages of traditional operating systems on top of heterogeneous multicore systems are as follows:
a.Designing a general purpose operating system with hardware optimizations is inefficient because of the diverse range of hardware designs.
b.Large number of cores in a machine leads to expensive cache coherency techniques, affecting performance.
c.Updating shared memory in a multi-core system is costly, has scalability issues and need kernel level code support.
Multikernel addresses these issues by having a general purpose OS that is hardware neutral and resolves the scalability issues using message passing mechanism as a mode of communication between cores.

3.Contributions:
Following are the key contributions:
a. Inter-core communication is in terms of messages, instead of having a single shared memory of high update costs, with advantages of pipelining and batching of messages asynchronously.
b.Abstracting heterogeneous hardware from messaging mechanisms such as user level RPC protocol using shared memory.
c.OS states are replicated across the cores with consistency techniques. This reduces synchronization overhead and access latency as in a shared state system.

4.Evaluations:
a.The authors evaluate the system by building a multikernel OS- Barrelfish with OS instance consisting of a CPU driver(kernel space) for same-core interprocess communication and monitor(user space) for inter-core coordination on top of homogeneous cores.
b.The system is evaluated for different message passing protocols for TLB consistency such as broadcast, multicast, unicast and show how scalability is achieved even as the number of cores increases.
c. The authors also evaluate how the latency for memory unmap operations in Barrelfish remains constant despite of increasing cores, in contrast to Linux and Windows. Different compute and IO intensive workloads are run, showing that Barrelfish performs as good as/better than Linux, though these are not on the same libraries.
Overall good evaluation of the system has been presented, except that it could have been done on heterogeneous cores too for the essence of it.

5.Confusion:
The definition and function of a dispatcher in multikernel model is not very clear.

Posted by: Sharanya Devaraj | January 28, 2016 02:54 AM

Summary
This paper proposes multikernel architecture as a solution to the scalability challenges being faced by existing operating systems in the event of diversifying and rapidly evolving underlying computer hardware. Multikernels regard OS as a distributed system of functional units implemented over a collection of independent cores with replicated state that interact only through message passing, thus allowing the OS structure to be hardware-neutral and scalable to a diverse mix of large number of different cores.

Problem
Most monolithic operating system rely on shared kernel data structures across cores and achieve much of their efficiency by closely optimizing for underlying hardware. However, with increasing system & interconnect diversity and core heterogeneity, it is becoming more difficult to create a general purpose operating system that will be statically optimized for future system configurations. The authors argue that shared-memory model for kernel is the most important limiting factor to achieve scalability on future hardware and hence needs careful reconsideration.

Contributions
The authors propose that a multicore system be visualized as a distributed system of cores, which communicate using messages and share no memory. With this so-called “multikernel” model now, one can apply knowledge and experience from building distributed systems to solve many of the future hardware challenges. The authors introduce three design principles for building such multikernel systems. First, all inter-core communication, including communication about changes to shared state, is to be done explicitly through messages. Secondly, excluding the message transport mechanism and hardware interface, almost all of the operating system is to be separated from the underlying hardware to ensure portability and easy maintenance. Thirdly, kernel state is now replicated at each core, instead of being shared across cores and global consistency is ensured through messages.

To justify the fact that the theoretical goals of a multikernel model is achievable in practice, the authors have developed a prototype system called Barrelfish, which achieves comparable performance with respect to current operating systems, while being adaptable and scalable to various heterogeneous hardware architectures. The key abstractions introduced in Barrelfish are CPU drivers and monitors. CPU drivers are light-weight localized abstractions for machine cores that are tightly coupled with the hardware architecture, whereas monitors are hardware agnostic user mode processes that are responsible for inter-core coordination.

Evaluation
To evaluate whether Barrelfish achieves its goals of comparable performance, scalability and adaptability, the authors have conducted a number of experiments related to TLB shootdown, messaging performance, and various compute-bound and IO-bound workloads. Of the many TLB shootdown protocols evaluated for Barrelfish, NUMA-Aware Multicast protocol is found to be more scalable and performant, when compared to inter-processor interrupts used in Linux and Windows. From the experiments on various workloads, it can be concluded that even for an unoptimized version, Barrelfish performs reasonably on current hardware but these experiments have demonstrated far-reaching benefits and portability for future hardware systems. However, one of the key things missing from the evaluations in the paper is the ability of Barrelfish to integrate with the heterogeneous cores, itself.

Confusion
The abstraction of dispatcher objects for a process and inter-process communication using dispatchers in Barrelfish is somewhat not very clear.

Posted by: Saket Saurabh | January 28, 2016 02:33 AM

Summary
The paper presents multikernel, a general-purpose operating system architecture that attempts to address the issue of performance scalability on multi-core architectures by using explicit message passing to achieve inter-core memory communication. The multikernel design relied on replicated data, the message passing model, asynchronous (split-phase) operations and dynamic performance tuning based on the underlying hardware capabilities. Barrelfish, a prototype implementation of multikernel, was developed and tested to verify the viability of its design.

Problem
Operating Systems that mainly relied on using shared memory for inter-core communication on multi-core architectures could not scale their performance with growing number of cores due to the poor scaling of expensive cache coherency mechanisms. As multi-core architectures with higher number of cores increasingly began to resemble a distributed system, issues which were earlier reserved for networking now surfaced in inter-core communication. Another issue which these large, monolithic operating systems faced then was of their inability to support hardware diversity both from a compatibility and performance perspective.

Contributions
The authors proposed the scalable multikernel OS model, which views the underlying multi-core architecture as a distributed system of communicating cores and uses relevant concepts such as explicit message passing and state replication for sharing process state across memory cores. This is in contrast to the cache coherency dependent shared memory model used by earlier systems. The multikernel was also designed to be hardware neutral to make it accommodate optimizations introduced by advancements in hardware; however, the dynamic hardware topology /load discovery and tuning mechanism in multikernel also allows it to make optimal use of its hardware resources, instead of being a mere commodity OS. Multikernel also uses pipelining and batching techniques from networking to resolve comparable problems in inter-core communication.

Evaluation
The Barrelfish prototype was an actual OS implemented to test the multikernel design. The Barrelfish implementation consisted of each core having an individual instance of the OS kernel running on it (CPU Driver) and the corresponding user interface (Monitor). An application can use URPC (user-level RPC) to send / receive messages from its corresponding dispatchers on other cores. Barrelfish also uses System Knowledge Base to dynamically discover underlying hardware and tune its operations accordingly.

Barrelfish was tested against the monolithic Linux and / or Windows systems on a variety of scenarios such as TLB shootdown, end-to-end unmap latency, IP loopback, compute bound workloads (based on certain OpenMP and SPLASH-2 benchmarks) and IO workloads. Barrelfish generally matched the corresponding Windows/Linux implementation performance in most tests, and even performed better in certain cases, thus proving that it can give reasonable performance compared to current commodity operating systems. However, the evaluation of Barrelfish was left incomplete as it was not tested on heterogeneous architectures.

Questions

What are point-solution data structures in OS kernels, and how does using them ensure maximum of one/two cache misses on specific architectures?
What is capability retyping?

Posted by: Shantanu Bhate | January 28, 2016 02:29 AM

Summary:
The paper argues for and presents a distributed style operating system to scale to modern day hardware which have many cores and are heterogeneous. The idea is to run one instance of a rather lightweight kernel on each core and rely on message passing between the kernel instances to maintain consistency of state.

2. Problem:
The number of processor cores in commodity hardware has been on the rise and is expected to do so in the future. Hardware is also becoming increasingly heterogeneous with a slew of architectural variations like NUMA, interconnect topologies, application specific co-processors and the core itself. Traditional system software is struggling to scale with the hardware and is often not optimized for the underlying hardware. The paper proposes an OS architecture to better match the scaling and heterogeneity in the underlying hardware.

3. Contributions:
i) The paper presents the multikernel OS architecture which achieves the following goals:
Various instances of the kernel interact by explicit messages which can be both blocking and non-blocking
The OS architecture is as decoupled from the underlying hardware as possible except in cases where the knowledge is critical to the performance of the multikernel. This ensures portability to a variety of hardware.
Shared state in the traditional OS is replicated in each instance and the consistency is maintained by message passing.
ii) The authors have implemented Barrelfish, an OS that draws ideas from the multikernel model. Each instance of the OS has a CPU driver that performs privileged operations by interacting directly with the core underneath and a monitor that coordinates and maintains the system-wide information through inter-core communication.
iii) A process structure that has a dispatcher on each core to schedule jobs where the dispatchers on different cores can interact through messages.

4. Evaluation:
The performance of Barrelfish matches that of the traditional OS for hardware with upto a few cores. Due to the message passing communication structure of Barrelfish, it performs much better than the monolithic OS for a large number of cores (4-32) and the performance grows steadily as the number of cores increases. This has been evaluated with micro-benchmarks and benchmarks that stress the IO peripherals, network stack and also just pure inter-core and inter-socket communication.

5. Confusion:
The Barrelfish seems to query the knowledge base to decide on the optimum length of the multicast tree at runtime. How can the network topology be inferred from the knowledge base? Also, why can’t this approach be taken in the monolithic OS to optimize the OS for the underlying hardware at runtime?

Posted by: Prashanth Balasubramanian | January 28, 2016 02:29 AM

Summary
The paper introduces a new OS architecture called the Multikernel .The model is designed for scalable multicore systems and structures the OS as a distributed system of cores, which can be locally optimized, communicating via explicit messaging and sharing no memory. Barrelfish, an implementation of the mulitlernel is also presented in the paper and is evaluated for scalability and adaptability.
The Problem
Computer hardware is changing and diversifying at a rate much faster than that of software in particular OS. Commodity computer systems are now shifting towards a networked, heterogeneous and dynamic , multicore model. This means that general-purpose monolithic OS cannot be optimized at design or implementation time for any particular hardware configuration . Additionally any hardware specific optimization becomes obsolete with the arrival of new hardware in a few years. Moreover scalability and adaptability also become challenging issues as the core counts keep increasing.
Contributions
All inter-core communications is performed through explicit messages. This shift from shared memory is a welcome change towards a possible non-cache coherent memory in the near future.
The OS structure is hardware neutral-this means running the OS on a hardware with new performance characteristics will not require extensive code refactoring thereby improving portability.
With no shared memory , the multikernel architecture leads to a model of global OS state replicated across cores where consistency is maintained through agreement protocols. This results in improved system scalability due to reduced load on system interconnect, contention for memory and synchronization overhead.
The idea of maintaining a FOPL system knowledge base which is a rich repository of data on the underlying hardware gathered via hardware discovery, online measurements and pre-asserted facts, is really helpful as this can be exploited to implement several hardware specific optimizations.
Viewing of the OS as a distributed system is a novel idea which provides an opportunity for reusing the rich trove of algorithms developed for distributed systems and networks in solving the challenges posed by modern day hardware trends.
Evaluation
The paper presents a case study of TLB shoot down which represents worst-case comparison for a multikernel. The Barrelfish unmap operation outperforms the equivalent IPI-based mechanisms in Linux 2.6.26 and Windows Server 2008 R2 Beta, Enterprise Edition . However there is a lot of scope for optimization in the implementation of message dispatch loop in user-level thread packages. It is also shown that a two-phase commit operation for changing memory ownership or capability retyping requires fewer cycles than IPI-based TLB-shootdown on Windows and Linux. IP loopback experiments , which can be a useful stresstest of the networking subsystem of the OS, show that Barrelfish’s URPC approach performs better than the in-kernel shared-memory IP stack model for Linux. Also both compute bound workloads and IO workloads show competitive performance on Barrelfish demonstrating evidence of scalability to a large number of cores. Thus we can conclude although Barrelfish is a relatively unoptimised implementation of the multikernel, it still shows a lot of potential and delivers competitive performance when compared with existing mature monolithic kernels like Linux and Windows.
Confusions
I could not understand how is a process dissolved into the abstraction of a collection of dispatcher objects.
It is said that the CPU driver does the local time-slicing for user-space processes, but how are the processes assigned to the different cores in the first place?
How is the system knowledge base maintained as in, is it replicated in each core or there exists a single shared copy(unlikely as multikernel argues against a shared memory model)?

Posted by: Amrita Roy Chowdhury | January 28, 2016 02:12 AM

1. Summary
This paper proposes a new OS structure for multi-core systems, the multikernel, that treats the machine as a network of independent cores with replicated state that communicate via message passing. This is in contrast to the prevailing shared-memory based structure. Learnings from distributed systems and networking can be leveraged in this new pardigm. The paper also discusses the motivation, design principles, implementation (Barrelfish), and evaluation via performance metrics.

2. Problem
• Current OS structure is tuned for a coherent shared memory with a limited number of homogeneous processors, and is poorly suited for adapting to the diversity and scale of future hardware.
• In commodity general purpose systems, the number of processor cores is growing, which in conjunction with the increasingly diverse mixes of cores, caches, interconnects leads to scalability, longevity, and optimization challenges.
• These challenges are mostly due to the structure - a shared memory kernel with lock-protected data structures.

3. Contribution
Primary contribution : The conceptualization of the OS as a distributed system of functional units (with OS instances) communicating via explicit messages - the multikernel model.
• Message based communication - enables networking optimizations such as pipelining and batching, and allows easier analysis.
• Global OS replicated across cores - helps scalability, and extensibility to peripherals and domains that do not share memory.
• Hardware-neutrality + provides a way to deal with cores with heterogeneous ISAs.
• The modularity of the OS can be exploited for intelligent resource management and job scheduling utilising the inter-core topology and other hardware knowledge. Learnable at run-time via a service known as the System Knowledge Base (same authors, prior publication).
• Barrelfish is factored into privileged-mode CPU drivers that are local to cores and perform low-level tasks, and user-mode monitors that encapsulate mechanism and policy and run an agreement protocol for consistency.

4. Evaluation

Notes:
• OS structure is hardware-neutral which disallows platform-specific trade-offs and optimization in the OS, but increases longevity, compatibility and scalability.
• Porting applications to this system should be fairly straight-forward.
• The OS programming style will formally move to the event-driven model. Application programs can continue to operate in the shared memory paradigm.
Benchmarks:
• Both large scale benchmarks and micro-benchmarks have been evaluated.
• TLB shootdown - Barrelfish uses Multicasting (allows parallelization) using SKB ennabed routing scales better than Windows/Linux using IPI.
• Two Phase Commit and IP Loopback - Barrelfish outperforms Windows/Linux.
• Various Compute workloads - comparable performance.
• I/O workloads - comparable or Barrelfish better.

5. Confusion
Dispatchers, SKB's working / location.

Posted by: Adithya Bhat | January 28, 2016 02:05 AM

Summary
This paper introduces the concept of a Multi-kernel, a scalable OS model for modern multi-core systems. A multi-kernel OS resembles a distributed system with replicated state and a well defined interface for communication across cores.

Problem
Modern computers come in diverse architectures and often consist of multi-core dies on a single node. The design of OS data structures and synchronization mechanisms becomes closely tied to the underlying architecture. These optimizations may not be portable across heterogenous cores on a single node. A second problem is that of sharing state across cores in a scalable manner. Relying blindly on shared memory can lead to bottlenecks due to scalability issues with cache coherence, so this calls for explicit communication mechanisms.

Contribution
A multi kernel basically views the computer as a networked system of cores. This alludes to using ideas from Distributed systems on a single computer. Firstly each per core kernel can be specialized to that core’s architecture while retaining a hardware neutral structure across the whole OS. Communication is achieved through a message passing interface. With a well defined interface the underlying message passing mechanism can be optimized for the interconnect and cache architecture of that system.Thirdly, by reducing shared state any update to an OS data structure could be done in a more controlled and scalable manner.

Evaluation
The authors evaluate the idea by building their own multi-kernel implementation called Barrelfish. Many OS activities such as memory management are managed from the user space with the kernel only mediating to authenticate and multiplex resources. One of the stress tests the authors present is that of TLB shoot-downs where they show how a carefully thought out message passing implementation gives Barrelfish more scalability across multiple cores. They also evaluate computer bound and I/O networking benchmarks showing comparable performance to conventional linux systems.

Confusion
This is regarding the end to end latency of TLB shoot downs when implemented with message passing. Its not clear to me what happens if one of the cores doesn’t happen to run the monitor when a TLB shoot down gets broadcasted. So the sender core would have to wait till the monitor gets scheduled in , making the latency much more than an IPI

Posted by: Brian Coutinho | January 28, 2016 01:52 AM

1. Summary
With the general purpose computing system hardware evolving at a drastic pace and its ever increasing heterogenous nature, a generic monolithic OS does not scale well to support the hardware. The paper presents a model of multikernel OS which has modules that are lightweight, flexible and localized, which enables easy hardware specific optimizations. Using a prototype implementation, Barrelfish, they demonstrate how the model would perform and scale better than the conventional OS on future platforms.
2. Problem
General purpose processors have developed with lightning speed ( for example the increasing number of onchip cores) and are becoming quite diverse ( due to heterogenous architectures). Thus it is becoming increasingly hard to develop OS that performs optimally across platforms and over time. The shared memory kernel design with global OS structures is claimed to be a major culprit. On-chip interconnects are not only evolving, but becoming more crucial due to large sizes of chips. While reliance on cache coherence in the shared memory models is proving to be a major performance bottleneck, message passing is becoming more efficient. To keep up with all these changes, hardware specific optimizations are required, an idea that does not go well with existing OS. Thus a new OS design is required.
3. Contributions
The paper presents multikernel OS model which has 3 principals at its core - make all inter-core communication explicit, make OS hardware neutral and favor replication against shared memory. The main idea is to have a design that consists of OS modules that are local to each core (CPU drivers) and communicate via message passing (using monitors). The model describes new abstractions for process structure and IPC, and mechanisms for memory management and shared address space. They further propose a knowledge and policy engine (system knowledge base SKB) which allows the OS to use dynamic platform specific optimizations. For proof of concept, the model is implemented in the form of a new OS, Barrelfish. Finally, they conduct experiments to show performance no worse than conventional OS, better scalability with number of processor cores and ability to use optimizations to exploit hardware topologies.
4. Evaluation
Barrelfish is used on Intel and AMD CPU based multicore systems with varying configurations. They demonstrate, based on TLB shootdown case study, that message passing for sending out invalidates scales well, more so with platform awareness. They study commit protocol and IP loopback to show lower latency, higher throughput and lower interconnect traffic in Barrelfish. While compute intensive workloads perform similarly, IO bound benchmarks favor Barrelfish compared to Linux.
The paper establishes a strong motivation for its argument and verifies how the conventional OS scale poorly with number of cores. It has done a good job of evaluating the idea using multiple criteria with Barrelfish. However, the performance of real workloads could have been studied with a more diverse set of benchmarks to make an even stronger case - NAS and SPLASH-2 don’t present anything interesting. Though the performance of primitives like two-phase commits and message passing in TLB shootdown show promise, it is still not clear how big a difference they would actually make - how frequent are these in real time?! The paper argues about support for heterogeneous architectures, but does no experiments to show how Barrelfish helps. Overall though, it does a good job of stirring substantial interest in the idea.
5. Confusion
The motivation behind having a hierarchical scheduling organization in process structure (with dispatchers) is not clear.
Paper claims that main performance advantage is due to lesser context switches in Barrelfish compared to Linux (for example in webserver case study). It is not apparent why that is so. Details would help.

Posted by: Lokesh Jindal | January 28, 2016 01:44 AM

Summary:
This paper gives details about a new OS design paradigm that views the OS as a distributed system of nodes communicating via message passing with the aim of improving performance, scalability as well as has natural support for hardware heterogeneity. The authors also discuss about Barrelfish, an implementation of their design and evaluate it across many workloads. The results reported seem promising as their performance is comparable to conventional OSs, without any optimizations.

Problem:
With recent advances, commodity computers contain more cores and are of increasingly diverse architecture. As a result of this, it is no longer useful to specifically tune a general purpose OS specifically to the underlying hardware. Another cause of concern is that the existing OSs use shared-memory structures in the kernel. In this paper, the authors advocate that there is a need to restructure the structure of the OS, wherein message passing is used to get rid of the restrictions imposed by shared memory.

Contributions:
The key contribution of the authors is that they structure the OS as a distributed system of cores that communicate via messages and do not share any memory. The proposed model, known as the multikernel model is guided by 3 principles. Firstly, the inter-core communication is made explicit via the use of message passing which has several benefits like batching, pipelining and split-phase communication. Secondly, the OS structure is made HW neutral to the extent possible. This enables easy adaptation of the OS on future HW. Lastly, view state as replicated instead of the conventional approach of a shared state, leading to the OS being more scalable as there is reduction in contention for memory as well as synchronization overheads. Barrelfish is an implementation of these principles and it consists of 2 major components. The CPU drivers, which runs in the kernel mode is responsible for communication with the underlying hardware. Monitors run in the user mode and they are responsible for the system-wide state as well as communication.

Evaluation:
The authors evaluate the performance of this model uding Barrelfish. They measure the latency involved in the unmap operation and find that Windows and Linux performance is comparable to Barrelfish when there are a small number of cores. However, as the cores increase, the performance of Barrelfish improves and it outperforms the latter two. This result shows that the OS is indeed scalable. The authors also measure the performance of Barrelfish on a number of compute as well as IO workloads and conclude that Barrelfish is as good as Windows and Linux when it comes to performance. One must take into consideration that the version of Barrelfish used to obtain these results is not optimized and it’s performance is bound to increase even more, once optimized.

Confusion:
I am not quite clear with the concept of dispatcher. It would be great if this could be discussed in some detail as a part of the lecture.

Posted by: Arjun Singhvi | January 28, 2016 01:30 AM

1. summary
Paper presents a New OS called BarrelFish , multi-kernel OS architecture that treats the machine as a network of of independent cores that communicate via explicit message passing, .e the ideas of distributed systems. No inter core-sharing at the lowest level. And the paper present the evaluation of the performance which is comparable to conventional OS .
2. Problem
The general purpose OS must perform well on diverse system designs and hence cannot be optimized to a particular hardware. And also With the increasing number of cores the cost of updating shared memory is increasing, the complexity of interconnect grows and cache coherence protocol becomes increasingly expensive.
The irony is hardware is changing faster than software and the effort required to evolve OS to perform well on new hardware is becoming prohibitive.So Why not restructure OS such that it can efficiently exploit the next generation hardware? Why not treat OS as a distributed non shared system , and employ sharing to optimize the model where appropriate?
3. Contributions
1. Separating the Hardware specific code in OS in a separate module, so that when hardware changes , the time taken to adapt to new hardware requires less time
2. Allowing core heterogeneity in the OS by treating system as a network on independent cores.
3.Using Explicit inter core message passing system to optimize sharing and reducing the cache coherency problem.
4. Maintaining a global state of the system as there is no shared data.
4. Evaluation
The author was able to achieve improved performance in TLB shutdown by using Multicast where first UPRC message is send to first core of each processor which forwards to other cores in the package. The unmapping of pages of barrelfish seems to perform better when number of cores increased compared to Linux and windows server.The compute bound workloads and IO workloads show comparable results to Single kernel systems. The author himself concludes that the evaluation did not address complex application workloads , the scalability and ability to integrate heterogenous cores .

5. Confusion
how skb is used at bootup time for optimization?. is it updated as new hardwares are added?

Posted by: Mushahid Alam | January 28, 2016 01:25 AM

Summary
The paper presents a new OS architecture, multikernel, which targets to solve the problems in using monolithic operating systems for scalable and changing hardware architectures. It comes up with the design and implementation of one such prototype, Barrelfish and also presents with some performance analysis compared to the traditional operating systems. The new architecture focuses on viewing the underlying hardware as a distributed system.

Problem
Over the years, hardware has been changing faster than system software. With traditional operating systems designed and optimized for certain specific hardwares, it creates a need to add similar support for the newer hardware architectures. But, being monolithic in structure, it becomes messier and difficult to inculcate changes for the diversifying hardware. Plus, such operating systems architecture with shared memory cache coherence do not scale up good with the increase in number of cores of a system.

Contributions
The paper presents a new design of operated systems - multikernel using principles of distributed system. First, make inter-core communication explicit. This gave OS more flexibility in using network optimizations and make efficient use of interconnects. Second was to make make OS structure hardware agnostic. The multikernel model divided OS into two process - user level (monitor) and kernel level (CPU driver). The monitor process was processor agnostic. Such kind of architecture eased the use of diverse hardware architectures.
Third was to replace the use of shared memory with replication. This allowed cores to have less contentions over the data in cache and hence provided an increased performance. This also helps in reducing the overhead caused by synchronization. Further, they also come up with a prototype, Barrelfish and provide the design and implementation for it. A performance comparison is made with the traditional operating systems and the measurement gives an idea on how well this prototype scales and adapts to the varying hardware without compromising performance.

Evaluation
The paper evaluates the performance for Barrelfish for some test cases like TLB shootdown, I/O workloads and compute workloads. The performances are also compared to Windows and Linux for TLB shootdown and compute workloads and it shows how good Barrelfish scales up. They also evaluate it on multicore machines to show the hardware neutrality that the OS provides. The benchmarks used here somewhat micro-benchmarks, meaning, the benchmarks do not have extensive testing as of what a real OS might face. An example would be to have multiple processes with different workloads. This would test the multikernel thoroughly and would be interesting to see if the multikernel model still fairs good as compared to traditional operating systems that are already optimized for the current hardware.

Confusion
The paper mentions about systems where heterogeneous cores are used. I am confused as to see what scenarios lead us to using diverse cores in a single system.

Posted by: Akshay Kanfade | January 28, 2016 01:13 AM

1. Summary
This paper introduces the multikernel, an OS structure redesigned to view machines as a network of independent cores that communicate via message passing. The authors describe their design and evaluate their system against conventional operating systems.

2. Problem
Computer hardware is evolving at a rapid pace and general purpose operating systems are typically optimised for a target platform. Also, as the number of cores increase within a system, the issue of interconnect among them would be similar to those faced in the networking domain. This makes cache coherency expensive and shared memory abstraction difficult.

With rapidly evolving hardware and multicore systems becoming common, the fundamental OS primitives need to be redesigned to accommodate new changes without compromising on performance and scalability.

3. Contributions
The primary contributions of this paper are the design principles and abstractions:
* Hardware agnostic OS structure
Enables the system to be quickly ported to new hardware.
Increased scalability as hardware/architecture dependent aspects like (interfaces, message-transport, etc..) can be isolated.
* Explicit inter core communication
Enables efficient utilisation of the interconnect
* Replicated State
Reduces synchronization problems and load on the core.

Barrelfish was implemented using these principles and had a kernel-mode CPU Driver and user-level Monitor with well defined functionality. The CPU Driver worked closely with the hardware while the Monitor was responsible for maintaining state and coordination.

4. Evaluation
The Barrelfish multikernel is evaluated along a few dimensions:
* TLB Consistency:
The authors evaluate TLB consistency as a function of increasing number of threads when entries are invalidated . The latencies scale well for non-NUMA machines but remained fairly constant for NUMA machines. Barrelfish outperforms both Linux and Windows Server when end to end unmap latency is measured.

* Message Passing
Results also demonstrate that message passing is slower than shared memory for low core counts but faster on higher core counts.

* Benchmarks
Barrelfish performed as well as Linux on OpenMP and SPLASH.
Barrelfish performed poorly on open Integer sort.

On the whole, Barrelfish has performed very well for a system in its infancy.

5. Confusion
What is stack ripping ??
How is load balanced across multiple cores of the system ??
How is consistency between replicated state maintained ??
SKB benefits and implementation ?

Posted by: Vinothkumar Siddharth | January 28, 2016 01:04 AM

1. Summary
The paper talks about using the ideas of distributed systems to create a new OS architecture called multikernel. Their design supports heterogeneity in hardware, OS, cores etc (i.e. more flexible). They also present a multikernel called barrelfish and do all their experiments on it. It has performance comparable to conventional OS.
2. Problem
Today's systems have diverse hardware configuration and processor cores. Most of the OS have system-specific design and it is not easy for such systems to scale and support new designs. Other problems faced by conventional OS are the cost involved in using shared memory, maintaining cache coherency and not considering network features, topology and problems.
3. Contributions
They propose a multikernel model with 3 main design principles: explicit inter-core communication (making use of network optimizations like pipeling, batching, modularity, asynchronous or split-phase communication etc), hardware-neutral structure (isolate distributed communication algorithm from hardware implementation) and use of replicated states (maintains consistency using messages, improve scalability etc). Their multikernel was called barrelfish which uses monitors to coordinate states, handle inter-core communication (which happens between dispatchers) and CPU drivers for security purposes. It also used a system knowledge base to optimize communication
4. Evaluation
The paper did a case study on TLB shutdown. Barrelfish scaled better (with increasing number of cores especially NUMA-aware multicast version of message passing). In general they show that its performance (using compute-bound, IO workloads) is comparable with conventional OS but there is scope for more optimization.
They did not run experiments with heterogeneous cores (would like to know whether results were as expected, did they have to add changes to their design)
5. Confusion
Don't completely understand the scheduling aspects of/ relationship between dispatchers and threads in the "process structure" subsection of the paper. An example would be nice

Posted by: Anubhavnidhi "Archie" Abhashkumar | January 28, 2016 01:02 AM

1. Summary
This paper is about a new kernel environment, multikernel, which treats OS as a distributed system of processes each replicating its own state that communicate via message-passing, and is hardware neutral in its structure. With the implementation of working prototype Barrelfish, authors are able to show that multikernel environment was able to satisfy goals without compromising performance on contemporary hardware.
2. Problem
Modern OS cannot be designed and implemented effectively to satisfy the needs of any particular hardware configuration, one of the reasons being cache coherence protocol. It is complicated for single kernel instance to support core heterogeneity because of difference performance characteristics or different ISA specific functions. Also with shared memory OS, latency of updates increases linearly with number of threads and modified cache lines.
3. Contributions
By making all inter-core communication explicit, system interconnects like pipelining and batching could be effectively used in order to optimize networking in “distributed system” OS with isolation and resource management on heterogeneous cores. This structure impose component communication only through a well-defined interface for evolution and redefinition/porting purposes. Hardware neutrality in the OS leads to a more stable code base with hardware independent distributed communication algorithms. Due to replication of data structures across cores, system scalability is improved by reducing the load on memory contention and synchronization overhead. It also introduced a new memory management technique through system calls that manipulate capabilities which are user level references to kernel object and main memory which led to uniformity in terms of maintaining general consistency mechanisms. With introduction of system knowledge base, it was possible to maintain database of underlying hardware for concise expression of optimized queries and to select appropriate message transports for inter-core communication. Further enhancement points toward processor specific data structures, file system and limited sharing.
4. Evaluation
The authors, with the help of prototype Barrelfish evaluate multikernel model with respect to baseline performance, scalability, adaptability and message passing abstraction. Inter-core messaging mechanism perform best with Multicast TLB shootdown protocol by exploiting sharing on L3 cache per processor. This is augmented by precomputing an optimal route for every source core in the system with the help from SKB. IP loopback stress test shows that Barrelfish achieves higher throughput by avoiding cache coherence traffic. It performed competitively with respect to compute and IO bound workloads compared to single kernel systems due to avoiding kernel-user crossings by running process entirely in user space and communicating over URPC. As the authors also summarize, this evaluation doesn’t address complex workloads and higher level abstraction like storage. It’d be great if research is continued on beyond contemporary commodity hardware.
5. Confusion
How much burden is imposed in general on application developer with this multikernel environment? Even if OS is scalable to take heterogeneous cores into account, it is not clear though how application can exploit such feature?

Posted by: Unmesh Phalak | January 28, 2016 12:37 AM

1. Summary
Computer systems are getting more and more complicated everyday with increasing number of processor cores and increasing architectural complexity. The commodity operating systems are not able to scale up with the complex systems. This paper suggests embracing the networked nature of the systems and using the ideas from networking and distributed systems the authors propose a new Multikernel OS architecture which addresses the scalability issues.

2. Problem
There is rising trend for increasing core count and increasing hardware diversity in variety of environments from personal computing to data centers. A commodity operating system which shares data across cores has to be carefully designed to avoid routing and congestion concerns caused by the cache-coherence protocols, and even then the operating system cannot be guaranteed to work satisfactorily on systems using different synchronization schemes. Designing a commodity operating-system to run (sub-optimally) on diverse systems makes it complex and so when a hardware vendor presents an opportunity of optimization or creates a new bottleneck for the current design, it becomes a non-trivial job to adapt the operating system to the new environment.

3. Contributions
To motivate rethinking of the OS as a distributes systems the authors proved by experimentation that message passing scales better than the shared-memory model with increasing number of cores and shared cache lines.

The paper also proposes a new Multikernel OS architecture based on the following three design principles:
* Let all inter-core communication be explicit
Explicit communication allows OS to use well known networking optimizations to efficiently use the interconnects and evaluate the performance using formal methods like queueing theory. Message passing allows for split-phase communication which decouples requests and responses and allows cores to do useful work or sleep while waiting for responses. This approach is also a basic requirement for spanning cores which are not cache coherent.

* Make OS structure hardware neutral
The authors suggest to separate the OS structure from the hardware as much as possible, so that there are only two aspects of the OS that are targeted at specific machine architectures: messaging transport and interface to hardware. This architecture enables to easily port the OS on hardware with new performance characteristics.

* Replicate OS state across cores
Replication is required to support cores that do not share memory, apart from that replication improves system scalability by reducing the load on the interconnects, memory contention and synchronization overheads.

Using the above design principles the authors implemented a Multicore OS named Barrelfish with the goals of good baseline performance, scalability with cores, adaptability to different hardware, exploiting the message-passing abstraction for performance and modularity to support hardware topology awareness.

4. Evaluation
The authors have done an extensive evaluation of the Barrelfish OS to see if it meets the design goals.

With TLB shootdown evaluation run using various messaging protocols the authors show that NUMA-aware multicast works the best for maintaining TLB consistency across cores. They also compare this protocol with Linux and Windows OS to verify that it scales better than the traditional memory-sharing protocols with increasing number of cores.

The paper also evaluates Messaging performance using IP Loopback tests and compare compute and IO bound workloads on Barrelfish and Linux to show that Barrelfish can support shared address-space parallel code without much performance penalty.

From the evaluation of the Barrelfish system we can conclude that the OS performs reasonably well with large-scale benchmarks but it scales pretty well for micro-benchmarks on contemporary hardware. But more work needs to be done to evaluate the OS from storage and complex applications' and systems' perspective.

5. Confusion
I would like to know how the user-level process can gain privileges to allocate and manipulate page tables.

Posted by: Mihir Shete | January 28, 2016 12:33 AM

1.Summary
In this paper the authors propose a new OS structure, the multikernel, that aims to address several limitations of conventional OSes with shared memory kernel by treating the machine as a network of independent cores, without any assumption of inter-core sharing at the lowest level and moves traditional OS functionality to a distributed system of processes that communicate explicitly via message-passing. The authors evaluate the implementation of this multikernel OS on a multicore systems and argue that its performance is comparable with a conventional OS and hence their approach to be promising.

2. Problem
Commodity computer systems continue to grow in terms of number of processor cores and are getting increasingly diverse along several related dimensions such as instruction sets, caches, memory hierarchies, and IO configurations. The basic structure of a shared-memory kernel with data structure protected by locks, that is employed by conventional OSes, makes it difficult to exploit optimizations in hardwares. All these changes mount enormous challenges for an OS to maintain portability, scalability and correctness accordingly since it is no longer possible to tune a general purpose optimization for all hardware models. This creates a motivation to create a new OS techniques for multicore hardwares with hardware neutral structure that can exploit new features of computer systems with increasing resemblance to networked-systems.

3. Contributions
The major contributions of this paper, as the authors point out, are:

Multikernel model: The authors introduce their OS architecture for heterogeneous multicore machines, called multikernel model. The OS is structured as a distributed system of cores with three basic principles: make all inter-core communication explicit, make OS structure hardware neutral and view state as replicated instead of shared.

Barrelfish: the authors describe Barrelfish, an implementation of the multikernel model and their choices in the design.

Evaluation of the Barrelfish through several experiments and measurements along with other conventional OSes that explores the feasibility and the extent to which it achieves the goals listed above.

4. Evaluation
As per the overall goal of the multikernel model, evaluation of Barrelfish is relevant within the context of a good baseline performance, scalability with cores, adaptability to changing hardware, message-passing abstraction for performance and exploitation of hardware topology-awareness. Several experiments and measurements in the paper show that Barrelfish performs comparably with Windows and Linux on many of the dimensions mentioned above. Although this evaluation does not address complex application workloads, the reported performance of the Barrelfish is impressive if we consider the fact that Barrelfish is still in a prototype phase whereas enormous brainpower and resources have been poured into Windows and Linux. Further, authors mention that porting applications to Barrelfish is straightforward since it offers very similar user environments like other OSes do.

5. Confusion

The authors do not mention about failure handling and recovery mechanism, which is a very important aspect in any distributed system.

Their overall memory management mechanism is still very confusing. It is not clear why specifically they chose to implement it outside of the conventional processor memory management system. What is the coordinating piece here?

Posted by: Udip | January 28, 2016 12:22 AM

Summary

The paper describes a new OS design, the multikernel, that views the machine as a collection of networked core. The OS is structured as a distributed system of cores where the communication happens via message passing and there is no memory sharing at the inter-core level.

Problem

Today's computer systems contain more processor cores and have diversified architecture. Scaling existing OS'es to suit the diversified hardware is becoming difficult and challenging. Usual approach thus taken today is to optimize OS to suit specific hardware platform and requirements. The existing OS'es are tuned for shared memory model that doesn't scale well as the number of cores increases. So there arises a need to rethink the OS different from hardware such that future hardware designs dont impact the basic premise of the OS and scaling is automatically inherited.

Contribution

The most significant contribution of this paper is to rethink the OS in terms of distributed system of cores that communicates by means of message passing. The overall design of multikernel is surrounded around the following 3 principles:

1. Inter-core communication is explicit so that the knowledge of state sharing is exposed thereby helping in more efficient use of the interconnect. This also provides isolation and better resource management on diversified hardware.
2. Segregate the hardware from the OS so that future hardware support comes naturally. Only the message transport mechanism and the hardware interface(CPU & devices) have to be segregated from the OS structure which is generally hardware specific.
3. Maintain state using replication instead of sharing the state so that system can scale due to improved locality, decreased load on system interconnect, reduced overhead for synchronization and contention for memory.
Using the above principles,the authors present Barrerlfish as one way to build a multikernel. The OS instance on each core is split into a privileged-mode CPU driver and a distinguished user-mode monitor process whose responsibility is to handle inter-core coordindation and coordinate/maintain global state. They also discuss how memory management and shared address space is achieved in Barrelfish.

Evaluation

I really like the evualation section where each and every claim is backed up by experiments using microbenchmarks. They measure TLB shootdown, message performance, I/O workloads, Network throughput, compute bound workloads. Barrelfish performed reasonably well even though it is lightweight and young compared to Linux/Windows that has been tremendously optimized for current hardware.

Confusion

How does efficient scheduling happens in a heterogenous environment? Wouldn't heterogenity be a curse as cross dependency on a slower node can impact performance?

Posted by: Yuvraj | January 28, 2016 12:09 AM

1. Summary: This paper talks about an OS model for a multicore system, called Multikernel by proposing light-weight OS instances on each individual core, and using message passing as a means to interact and maintain shared structures. The authors also talk about Barrelfish, which is an initial unoptimized implementation of Multikernel model, and show that it performs as good as current OSs without any optimizations, and hence has a lot of potential.

2. Problem: The authors enlist multiple problems with the current OS implementations. One, because of the diversity of hardware available in the market and because of the complex interactions between structures in the current OS implementations, it is becoming increasingly difficult to optimize any OS for a particular hardware. As a result, hardware is getting updated more fastly compared to OSs. Second, the authors realize that future architectures will contain more diverse and more number of computing platforms. So, it is important to decouple some basic functionality from the OS, which would present a common interface to the rest of the systems, thus making the OS ‘hardware neutral’. Third, the authors realize that if the OS itself uses shared memory to implement its structures, it is wasting a lot of time in overheads such as synchronization and resource contention. So, they propose to replicate the resources in different cores. Moreover, with each resource now sitting close to its corresponding core, latency can be reduced.

3. Contribution: One of the key contributions of the paper is to present a model of OS which can easily scale with the number of computing platforms. To do this, the authors propose running the OS on each core. Instead of sharing resources such as data structures (like process run queues), or memory (like single page table for a process), they propose replicating the resources. With the increasing resources at hand, the traditional systems aimed at preserving or efficiently sharing resources need to be updated, and replication might just be the answer. Although it is important to maintain consistency across each resource, so the authors present a mechanism of message passing for that. They realize that this may lead to contention for communication and so propose techniques like multicast, pipelining, batch messaging to optimize the communication. One other important contribution is their view to making the OS hardware neutral. The device driver is the part that actually sits on the hardware, and encapsulates some basic functionality necessary. As a result, it is single-threaded, non-preemptive, and can be easily optimized and ported to different hardware platform. The other functionality of the conventional OS is provided by monitors which sit on top of CPU drivers. The authors also say that cache coherence protocols do not scale well, and they show in a future work that these protocols can be removed, and it will be upto the OS to ensure consistency.

4. Evaluation: To evaluate their model, they use an initial, unoptimized implementation called Barrelfish. They use the kernel level process of TLB shootdown to show the performance of different message passing protocols. They show that hardware-aware protocols like NUMA-aware multicast perform better. They also show that page unmapping on Barrelfish outperforms Linux and Windows beyond 14 cores on 8*4-core AMD systems. They also run some compute-bound and IO-bound workloads on AMD systems and the results look promising. A better performance is expected for an optimized Barrelfish implementation. However, the benchmarks they chose do not represent a mix of workloads as would be in an actual OS. They realize this issue, and point it out in a future work.

5. Confusion: They do not talk much about SKBs, and how it computes the optimizations? Is it something like a BIOS, which would run on bootup and find out the hardware details? They also do not talk about the different consistency models they use in Barrelfish. If the ultimate aim is to build a distributed system out of the cores, what would happen if a new computing resource/core is added later? For eg. a GPU. Do they consider this case? Will a re-running of SKB solve the issue? Also, the talk about cache-coherence protocols being not scalable-There exists a directory-based cache coherence implementation that is scalable. How do they explain that?

Posted by: Mohit | January 28, 2016 12:08 AM

1. Summary

The paper presents the design of an Operating System with multiple kernels running on multi-core hardware such that each kernel process runs on a particular core. The authors believe that machines of the future will primarily be multi-core and cores can be heterogeneity. So they introduce OS abstractions inspired from distributed systems which would scale with number of cores and heterogeneity of hardware.

2. Problem

Computer hardware is changing and diversifying at a very high rate leading to substantial scalability and correctness challenges for OS designers. In particular the existing operating systems had to undergo huge code changes to support for new and diverse hardware layers. Many optimizations in OS involve trade offs specific to hardware parameters and thus are not portable. The authors wish to solve the problem of removing this strong dependency of OS functionality on underlying hardware, so that the OS design performs well with any future hardware. The authors also want to redesign the OS functionality such that it scales in performance with the number of cores.

3. Contribution

The authors attribute the current problem to the basic structure of shared-memory kernel operated by locks. The cost of updating shared state using shared memory scales better using message passing as compared to using shared memory. The main contribution of the paper is the design of OS which uses explicit communication between cores, is hardware neutral and the follows state replication model instead shared model. The design is inspired from thinking about structure of OS as distributed systems of kernels that communicate by messages to maintain a consistent state. The authors also present a multikernel prototype Barrelfish which explores the performance of the proposed model. This structure of OS reduces contention for memory and synchronization as compared to the shared state model of OS.

4. Evaluation

The implementation of Barrelfish of BarrelFish uses separate CPU driver and monitors processes on a core to provide OS functionalities. This leads to a lot of context switches and overhead. However, this was just to test the idea of a distributed OS communicating through messages. The prototype was evaluated on various workloads like TLB shoot-down (messages used in Barrelfish as compared to shared memory in others), end-to-end unmap latency, IO and compute based workloads. Barrelfish performed reasonably well as compared to Linux, despite not being an optimized implementation.

5. Confusion

Will the distributed nature guarantee all kernel processes to be in consistent state?

Posted by: Anshul Purohit | January 27, 2016 11:49 PM

1. Summary
The paper discusses about multikernel which provides a generic solution to underlying hardware using ideologies from Distributed Computing and Networking, via message passing instead of shared data. As compared to traditional OS, it does no much worse since it does not rely on expensive cache coherency and scales well with number of cores. The authors implemented a working prototype - Barrelfish which proved their theoretical ideas experimentally to a certain extent.
2. Problem
Lack of an OS which doesn’t scale for heterogeneous cores, rather than tries to address the issue by providing complicated optimizations. Since there are myriad hardwares, optimizing for each is a point solution and complicates the issues in the long run. Existing message passing based solution existed but they were also highly optimized for a set of architecture choice.
3. Contributions
The authors came up with a general purpose solution, which had a real world implementation in form Barrelfish. It treated the hardware as neutral entity and used message passing as its core means of communication. Rather than sharing data, they insisted on replicating the data. This solved costly cache coherency issues. By further pipelining and batching requests, they were able to optimize it even further.
Each core has its own kernel instance running on it - CPU driver which interacts with the user level processes using ‘monitor’ as abstraction. Any user level application which needs to interact with the hardware sends a URPC to its dispatcher peers. These abstractions ensured that the choice of design was generic for multi-core / heterogeneous system. They also introduced the concept of SKB which would study the individual architecture and provide dynamic optimizations for queries e.g., by studying the tree at runtime, it computed an optimal route. Also, they discussed to further provide optimization where applications benefit from shared memory; though Barrelfish was based on strictly no data sharing.
4. Evaluation
They evaluated their ideas by running a set of CPU and I/O intensive workloads on Barrelfish and then compared it against Linux and Windows. They were able to demonstrate that multi-kernel can be a viable alternative for multi-core system as a general OS. However, they did not do enough to prove the claim on a heterogeneous multi-core system.
5. Confusion
- The authors did not discuss how to visualize the single virtual address space on multiple cores i.e., how will processes’ address space be split across cores.
- Why they did not test their implementation on heterogeneous architecture.

Posted by: Vikas Goel | January 27, 2016 11:30 PM

Summary
Advances in hardware lead the authors towards redesigning operating systems to avoid being hardware-driven. They propose a new design- the multikernel, that makes the multi-core heterogeneous system synonymous to a distributed shared-nothing network of cores that communicate through explicit message-passing, thus separating OS from hardware as much as possible.

Problem
Conventional shared-memory kernel would work for a multi-core machine, but there would be contention due to a shared-memory, complexity to provide cache coherence, network problems of congestion and routing. It is difficult to build a single general-purpose OS catering to all hardware changes, and to heterogeneous cores.

Contributions
They apply proven techniques from distributed shared-nothing systems and design the system to be a network of cores, called the multi-kernel which: makes inter-core communication explicit through message passing [allows optimizations such as pipelining and batching, resource management, split-phase communication and modularity], makes OS structure adaptable to hardware changes by allowing flexibility to change due to the decoupling of distributed algorithms with h/w implementation, and finally requires the OS state to be replicated across cores that is globally consistent, which in turn achieves scalability.
They implement this design in the form of Barrelfish, which factors OS in each core into CPU-driver (specialized, exokernel-like process per core) and monitor process(allow inter-core communication through URPC for exchanging OS states and resources), uses capabilities to manage memory(complex but uniform).

Evaluation
Performance- measured in TLB shootdown experiment, Barrelfish uses multicast URPC messages and capability retyping and it outperforms Linux/Windows in latency vs cores.
Scalability- measured in compute-bound workloads, Barrelfish is comparable.
Adaptability- only for homogeneous cores, but not tested on heterogeneous cores.

Comments
While the paper does clearly portray the multi-kernel design benefits, it is very repetitive(motivation and goal). It provides a tangible implementation in the form of Barrelfish, which might be flaky in few directions, but encouraging.

Confusion
Still left unconvinced with the thread~dispatcher model. does 1 thread belong to 1 dispatcher object, and multiple threads can be present in such a dispatcher object, which then communicate with other cores for resources? Would a thread transit into another dispatcher object when it requires to execute in a different core?

Posted by: Tithy Sahu | January 27, 2016 11:28 PM

1. Summary
This paper introduces Multikernel, an OS architecture redesigned to improve OS scalability and support for future heterogeneous system. They further discuss Barrelfish, a version of OS adhering to the principles of multikernel. It is evaluated for a range of workloads and compared to conventional OS.

2. Problem
The authors claim that managing a monolithic OS is difficult. Optimizing the OS for specific hardware is a futile effort given that hardware is changing at a very fast rate. The added diversity in current hardware systems makes sharing a single OS kernel instance across such heterogeneous systems less efficient. Finally, coherence based sharing of protected kernel data structures across multiple cores is costly and inherently unscalable. In order to manage these issues and with the belief that future hardware will exhibit network effects inside chips which a system software should adapt to, the authors propose a rethink of OS design.

3. Contribution
The main contribution of this paper is the multikernel philosophy of explicit inter-core communication via message passing, hardware-neutrality and replication of kernel state. Barrelfish, the author's version of such an OS highlights the power of such a philosophy. By breaking a monolithic OS into abstractions of CPU Drivers, Monitors and inter core communication, the authors ensure that hardware specific optimizations are implemented for relatively lightweight CPU Drivers and interconnect optimizations are captured by the module handling inter core communication. Most of the OS functionality are transferred to hardware neutral monitors. This makes managing an OS easier. CPU drivers could be easily optimized for each kind of core in a system without disturbing the monitor. The authors are also the first ones to highlight how explicit message passing is more performance friendly than sharing of data structure based on cache coherence mechanism. This performance could be further improved by optimizing message passing based on network topology.

4. Evaluation
The authors evaluate the performance of a multikernel OS by developing an OS running on various multicore Intel and AMD chips. TLB shootdown is used to illustrate the performance of the message-passing protocol of Barrelfish as compared to the interprocessor interrupt based approaches found in Linux and Windows Server. Both systems incur comparable performance for small number of cores, while barrelfish outperforms them on systems with more than 14 cores. The performance of several compute-bound workloads as well as IO workloads is similar on both the systems. However, according to the author’s, the performance of Barrelfish would improve once you optimize it as much as other commercial OS.

5. Questions
1. I did not understand need of a different process structure in barrelfish?
2. Why did they experiment with moving Virtual Memory management outside the CPU Driver?

Posted by: Urmish Thakker | January 27, 2016 11:25 PM

1. summary
This paper presents the design of a multikernel operating system in which the overall structure of the OS is independent of hardware changes and hardware evolution, all inter core communication is explicit and the state of the system is replicated and distributed across the cores.Thus the OS is analogous to a distributed system. The paper also discusses Barrelfish which is the implementation of the design principles.
2. Problem
The nature of hardware is constantly evolving but a general purpose operating system cannot be customized for one particular hardware because hardware deployed on different systems can vary greatly.Therefore optimizations and trade-offs introduced by new innovation in hardware technology cannot be incorporated into operating systems without making major changes. Additionally the number of cores and complexity of interconnection is increasing. This leads to scalability and correctness problem for OS designers.
3. Contributions
Firstly, all inter core communication is through explicit messages and not shared memory. This enables the system to use batching and pipe-lining to improve performance, improve isolation and resource management for heterogeneous cores,decouple request and response of messages through split phase communication and enables the communication to be human friendly.Secondly, it isolates system code related to machine architecture to the message transport mechanism module and the direct interface to the hardware therefore hardware specific optimization can be made without cross cutting code changes.Lastly, the system state is maintained as replicas across cores which are kept consistent using explicit messages.This improves scalability and reduces load on interconnect and memory.The above principles have been implemented in Barrelfish wherein the CPU driver module is optimized for x86-64 architecture and shares no state with other cores and the monitors collectively maintain the system wide global state by means of an agreement protocol.
4. Evaluation
The multikernel OS performs comparably well to Linux and Windows which have been optimized for specific hardware.The Barrelfish uses a two-level multicast tree for TLB shoot-down messages which enables more significant scaling than uni-cast or broadcast principles. Barrelfish implemented a NUMA aware multicast protocol that scales well across 32 systems.The message based unmap operations outperform the IPI based mechanisms in Linux and Windows.Further using the same multicast techniques , Barrelfish achieves good scaling and performance while changing memory ownership using capability retyping.Lastly,Using the SPLASH-2 and OpenMP benchmarks the authors show that Barrelfish can support large shared address space for parallel code.The authors provide sufficient quantitative statistics and information to reach the conclusion that Barrelfish performs comparably well to existing systems thus their evaluation was appropriate.
5. Confusion
What is stack ripping in the context of message passing ?What is the concept of capability retype and revoke in the context of memory management?

Posted by: shreya kamath | January 27, 2016 11:19 PM

1. summary
As the core count on machines are increasing and as different versions of architectures are being released, the OS needs to be modified to support the increase in resources as well as architectural differences. As a result, the paper introduces a multikernel that optimizes for each architecture specifically and utilizes the processors through distributed systems model to increase performance.
2. Problem
One of the biggest problems is that new architectures are coming out rapidly, even if they are just a version update of a previous architecture. However, we continue to have a general purpose operating system that does not really take advantage of the uniqueness of each architecture.
On top of that, as the number of cores increase on a machine, it would be intuitive to think that performance will improve, but because of cache coherency needed amongst the cores, and based on the cache coherency protocol in place, some multi-process operations might lose performance because cache lines have to be copied across cores.
3. Contributions
As a result, the authors propose a new kernel design, where the kernel is optimized for each architecture to take full advantage of each core’s capabilities. Furthermore, the kernel will treat each core as a node in a distributed system and use message passing to communicate data between cores to avoid the cache coherency protocol.
In their design, inter-core communication is explicitly done, every global data that keeps the state of the system will be replicated across cores, and the OS at the higher level will be hardware-neutral.
The authors tested the performance of their system in scenarios like TLB shootdown, but they also admit, which I find interesting, that their system is really small and lightweight so it cannot be compared too much with current systems.
4. Evaluation
One of the things that I thought was missing is the mention of current OS algorithms to schedule processes on ccNUMA systems. Usually the OS might try to schedule related processes close to each other to maybe share caches or reduce the transfer time of cache lines, but the authors did not really mention if any algorithms such this were present when they performed their messages vs. shared memory experience.
5. Confusion
I am not familiar at all with distributed systems, but I assume there are some drawbacks with their protocols. How will those drawbacks affect the multikernel design?
I wonder if you turn off the cache coherency protocol so that, even though each core communicates explicitly with other cores, no polling or broadcasts occur which could waste cycles.

Posted by: Arman Shanjani | January 27, 2016 11:07 PM

1. Summary
This paper describes a new operating system that treats a machine like a network and relies upon message passing for all its communication. The authors view state as replicated instead of shared and claim to make all inter-core communication explicit and make the OS structure hardware-neutral.

2. Problem
With the rise of multicore architectures, single computers increasingly resemble distributed systems. This is particularly true of machines with heterogenous components, such as GPGPUs or even multiple CPUs with different ISAs. Operating systems originally designed for single-core computers struggle to run such machines efficiently. While operating systems have been designed for distributed systems, they do not fit well either, since they were designed with uses in mind that differ greatly from those of a typical single computer. Also, architectures have become increasingly diverse, which increases the challenge of porting existing operating systems to those architectures.

3. Contributions
This paper contributes a system in which every CPU is managed independently. This allows the driver for every resource to be specialized to that resource. These independent components then communicate via message passing. This allows improved asynchronous communication. They implement their system for the x86-64 platform using a driver and a monitor for each CPU, where the user-space monitor does all the inter-core coordination. The implementation also uses a system knowledge base, which uses a fragment of first-order logic to maintain knowledge of the underlying hardware.

4. Evaluation
The authors evaluate their work on one detailed case study, on macro-benchmarks of a web server and some compute-bound programs, and on microbenchmarks that test the performance of various system primitives. They seek a system with good adaptability to different hardware, but are unable to evaluate this much. Their benchmarks seem to indicate good results and be appropriate for their tool.

5. Confusion
I was confused in section 4.2 by term “monolithic microkernel”: this sounds like an oxymoron, since a microkernel is the opposite of a monolithic system. I also desire a better understanding of their discussion of memory management.

Posted by: Stephen Lee | January 27, 2016 09:46 PM

1. summary
Multicore system needs OS to be changed toward message-passing method rather than shared memory because the performance in message-passing is efficient on distributed and network-connected system. Coherency is maintained by data replication on each core’s caches and by a single address space virtual memory method to hold same page table through cores.

2. Problem
Client-Server workload needs change Operating System because all workload of distributed system are impossible to be optimized, for example, the secure locks need a lot of latency to acquire in the multicore system.
Hardware optimization is not applicable because there are a lot of different types of hardware as well as it is not predictable which hardware will be used in future, the OS needs to change bottleneck point in current operating system.
The diversities of system and interconnect result in not adapting shared-memory policy because the interconnection of networking, ring or mesh, cause many traffics.

3. Contributions
To deal with multi-core system workload, Multikernel, new OS structure, is implemented which uses OS functionality in distributed system with message-passing method, explicit communication, rather than shared-data communication for better performance.
Shared-data communication is inefficiency in terms of latency, which comes from hierarchical changes of cache data shared data while Memory passing is efficient with lightweight remote procedure call.
Replication is used to replace the locks to keep consistency between the caches in multicore system. Replicated data are updated and maintained in cores by exchanging message when data in a core is updated.
The hierarchy is constructed such a way: CPU driver resides on the hardware residing at the very bottom and the monitors are placed between CPU driver and applications. Intercore communication available mechanism is cache-coherent memory.
The paper introduces the shared resource management methods such as memory and a global page table update by each kernel.

4. Evaluation
The TLB shootdown, updating all the TLBs in cores followed by page updates, shows that the 2-level multicast are better performed than broadcast and unicast. Unmapping latency is stable in Barrelfish with increasing Cores. The Barrelfish shows better performance in the operation of getting memory ownership and higher throughput in IP loopback.

5. Confusion
The shared memory space is used as a channel in this paper. Who in the connected cores has the ownership to handling the shared memory space?
Is it easy to write the codes about virtual memory management by user? Is there any merit to do this by user than handling it by cpu itself?

Posted by: Choungki Song | January 27, 2016 09:18 PM

1. Summary
This paper describes a multi-kernel system as well as the implementation of one, named Barrelfish. The basic idea is that the system treats a machines as a network of independent cores, assumes no inter-core sharing at lowest level and moves traditional OS functionality to a distributed system of processes that communicate via message passing.

2. Problem
As hardware becomes increasingly heterogeneous, there are an increasing number of tradeoffs regarding hardware specifics such as cache hierarchy and memory consistency. Optimizations are often specific to a particular implementation and thus are not portable across system designs.

3. Contributions
(1) All inter-core communication is performed using explicit messages. The OS can then provide isolation and resource management to heterogeneous cores. This also allows for “split-phase”, in which a request is sent and then operation immediately continues without wasting time. There is the expectation that a reply will arrive at some point in the future.
(2) The OS should be divorced from the hardware as much as is possible. There is an interface to hardware (CPU/devices) and message transport mechanisms, but all else should be hardware-agnostic. As a result, distributed communication algorithms may be separated from the hardware implementation itself. Finally, protocol and message transport implementation can be tailored to on-the-fly observations of the workload and topology.
(3) Any potential shared state in a multikernel is accessed and updated as if it were a local replica. Consistency is maintained via message sharing. System scalability is improved by a reduced load on the system interconnects, and synchronization overhead is lowered. Finally, changes can be easily applied across cores by using the replication framework.
(4) Developments in distributed networks can be applied given the decentralized nature of the multi-kernel operating system.
(5) Kernel-level code and complexity should be minimized as much as possible, with all virtual memory management performed by USER-LEVEL code via system calls called “capabilities”. These are user-level references to kernel objects or regions of physical memory. Retype and revoke operations control capabilities to different regions. Capabilities are complex, but they bring uniformity to the implementation. This is important in the case of page mapping/remapping, which requires that sort of coordination.

4. Evaluation
They presented a main case study featuring TLB showdown as well as comparisons of specific performance/workloads.
(1) TLB shutdown: in Linux/Windows, inter-processor interrupts are used to trap cores. Cores then write to a shared variable, invalidate the TLB, and continue. The main cost is in the cost of the trap (800 cycles). In Barrelfish, messages are used instead. The local monitor broadcasts invalidate messages to others and waits for replies. In particular, knowledge of the hardware is used to optimize the protocol. The system knowledge base is used to construct a multicast tree. There is high unmap latency, though, with the overhead of LRPC and scheduling effects in the monitor. However, the complete process “quickly outperforms” Linux and Windows systems. Optimizations in the user-level threads package may further improve performance.
(2) Messaging performance: achieves high throughout and fewer cache misses due to the avoidance of shared-memory in URPC messages. There is less overhead as compared to the shared-memory network stacks that Linux and Windows use.
(3) Compute-bound workloads: has more or less identical performance to Linux.
(4) IO workloads: Performs identically in terms of network throughput, but in dynamic/static web service, Barrelfish performs better due to running almost entirely in user-space and avoiding cross-kernel communication.

5. Confusion
How exactly is consistency maintained between all states/cores, such as that required for a shared message space? It just says “messaging protocol” which is a bit vague. Also, how exactly does scheduling for a process on a single core work? How do dispatchers communicate with monitors?

Posted by: En-Ui Annie Lin | January 27, 2016 09:14 PM

CS 736 Reviews - Spring 2016

The multikernel: a new OS architecture for scalable multicore systems

Comments

Post a comment