« U-Net: A User-Level Network Interface for Parallel and Distributed Computing | Main | Lottery Scheduling: Flexible Proportional-Share Resource Management »

Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism.

Thomas Anderson, Brian Bershad, Edward Lazowska, and Henry Levy. Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism. ACM Trans. on Computer Systems 10(1), Feburary 1992, pp. 53-79.

Review due Tuesday, 3/10.

Comments

1. Summary
    This paper discusses the performance limitations of kernel-level threads, the limitations of current user-level thread packages and the lack of kernel support for user-level threads in the multiprocessor operating systems of those times.
    It describes the design, implementation and performance of a new kernel interface and user-level thread package that together provide the same functionality as kernel threads with the performance gain that can be achieved with user-level parallelism.

2. Motivation
    The authors state that neither user-level threads nor kernel-level threads provide satisfactory performance.
    While user-level threads are flexible and require no kernel intervention providing excellent performance, in a "real-world" operating system environment they might perform poorly due to multiprogramming, I/O and page faults.     Kernel-level threads have system level support but are too heavyweight because they involve calls into the kernel. Thus, its performance takes a hit and is slower compared to user-level threads. Any enhancements have to implemented in the kernel which makes it complex.
The paper describes a kernel interface and a user-level thread package that together combine the functionality of kernel threads with the performance and flexibility of user-level threads.

3. Contributions
    The authors stating that kernel threads are considered as the wrong abstraction on which user-level parallelism is supported was at the core of this idea.
    It provided kernel support for user-level threads in contemporary multiprocessor operating systems thus improving the performance of parallel programs as the benefits of kernel and user-level threads are achieved at once.
    It is achieved through the use of scheduler activations. Each application is provided with a virtual multiprocessor and it has knowledge of the processors allocated to it and control over which threads are running on these processors. The OS has control over the allocation of processors to processes. The kernel notifies the address space scheduler of kernel events affecting the address space (change in the number of processors, I/O, page fault, etc.), thus allowing the application to be aware of its scheduling state. In return the thread system in each address space notifies the kernel of any operations that can affect its processor allocation decision. Thus Scheduler Activations serves as a vessel (execution context) for running user-level threads and allows saving the context of its user-level thread when it is preempted by the kernel. This gives control of scheduling itself to the application.

4. Evaluation
Topaz thread management routines were modified to implement scheduler activations. FastThreads was also modified to add support. The latencies of 2 thread operations are measured: Null Fork and Signal-Wait. While it performs better than kernel threads, it is close to that of FastThreads package due to the overheads involved with using critical sections. The upcall performance is five times worse than Topaz threads. The author states that the limitation of the experiments is the use of limited number of processors as it can't mimic large parallel applications or higher multiprogramming levels.

5. Confusions
Will exposing processor allocation information to the application lead to complexity in the application which otherwise it should not care about much.

Summary:

This paper demonstrates how the kernel of a multithreaded system could be built with support for user level threads and the performance benefits of user level threads. The paper is based on the authors assertion that user level threads that execute in the context of kernel threads face system integration issues while kernel threads are too heavyweight and suffer from poor performance.

Problem:

As pointed out in the summary, the main issue that this paper attempts to solve is thread management at the user level. Reading the papers introductory section, the authors do make a good case for user level threads.

Contributions:

The biggest contribution of this paper is a thread management system that combines the benefits of user level thread management (flexibility and better performance) with those of kernel threads (better system integration and kernel support). They designed their system so that processes become processor aware while the kernel handles the allocation of processors to processes. In addition, the kernel now explicitly communicates events like blocking and preemption to the user thread management system.

They designed a new context for thread execution, a scheduler activation. The scheduler activation acts as the interface between the kernel and a user level thread. They ensure that there are as many scheduler activations as there are available processors and that scheduler activations do not block whenever a user thread does so. They ensure this by saving a blocked threads state and context switching user level threads.

One of the interesting aspects of this paper is the view of the scheduler activations stack as containing user level stack and a kernel stack that threads can use to call into the kernel.

They provided a clean interface that allows a process to request for more processors or give up idle processors. This also allowed them to implement a system that reduces the priority of processes that run for a long time.

Evaluation:

The authors compare their performance with a system that uses kernel threads to implement user level threads (FastThreads) and one that uses kernel threads (the original Topaz threads). Their system doesn't really achieve better performance and has worse performance in some cases, particularly tests that need a larger number of upcalls. They attribute this to the fact that the baseline implementation was done with assembly while they use a higher level language.

Their results do show the potential performance benefits of kernel support of user level threads so it would have been interesting if this paper had a more optimal implementation.

Confusion:
I don't understand how this system handles priority. In section 3.2, they say that a process can signal that it has more processors than work and give up a processor knowing that it would be reassigned a new processor due to the priority mechanism. However, they also say that a process needs to wait until a processor becomes idle. It doesn't seem like there's a priority mechanism in place.

How does their method of creating a copy of a critical region handle cases in which the second thread blocks in the critical region as well?

Summary

This paper argues that neither approach of threads, viz. supported by operating system kernel or supported by user-level library code in application address-space, has been fully satisfactory. They argue that on the one hand performance of kernel threads is inherently worse than that of user-level threads but on the other hand user-level threads in the real-world multiprocessor operating systems can exhibit poor performance or incorrect behavior. They propose a design of a new kernel interface and user-level thread package that provides the same functionality as kernel threads without losing the flexibility advantages of user-level threads.

Problem

While using user-level threads, the scheduling of threads within a process happens in the user-level. It can't use the physical processors efficiently, for example when a blocking call happens (due to I/O or page faults). Kernel threads are better than user-level threads in this respect, because all the threads are scheduled by the kernel and hence when a blocking call happens, kernel can perform a context switch and hence by running another thread on an idle processor. But the performance of kernel threads are bad due to the overhead of trapping to the kernel while switching threads. Thus, user-level threads perform the best when the application is uniprogrammed and does no I/O. Kernel threads work better when there is multiprogramming and interaction of different system services. Thus both the techniques are limited.

Contributions

A design of a careful interface for communication between the kernel and user-level, such that the mechanism of processor allocation to the user-level process is handled by the kernel but the policy of scheduling threads on the available processors is the responsibility of the user-level scheduler.
The communication between the kernel process allocator and the user-level thread scheduler is called Scheduler activations.

-Through scheduler activations, kernel notifies the user-level address space of events affecting that process, such as adding a processor.

- The thread system in each address space notifies the kernel of the subset of user-level thread operations that can affect processor allocation decisions. Thus for the majority of operations we can get good performance because they do not need to be reflected to the kernel.

Evaluation

The authors have implemented a prototype of the design on the DEC SRC Firefly multiprocessor workstation. They modified the kernel threads of Topaz operating system and modified the FastThreads, a user-level thread system. They found that for workloads that did not involve much kernel intervention, the proposed implementation performs similar to the original FastThreads. But, when there is frequent paging and I/O, FastThreads with scheduler activations performs 1.5x better than the original FastThreads.

Confusions

As scheduling of threads will be done by each individual address space and not by the kernel, how do the pre-emption work in switching user-level threads within a process?
Debugging consideration section needs more discussion.

Summary:
This paper analysis and evaluates the dilemma of supporting threads using operating system kernel or by using user-level library code for parallel programming; and suggests a hybrid solution to the problem.

Problem:
Supporting threads at user or kernel level has its own problem each. User-level threads have system integration problems due to mapping mismatch between physical and virtual processors; while, kernel threads are heavy weight for use in many parallel programs.

Solution:
The solution proposed in the paper is an integrated solution of new kernel interface and user-level thread package that provides performance of user-level threads and functionality of kernel threads. Solution has following components:
-Application address mapped virtual processors: Each application has allocated processors, with complete control over threads to be run on these processors.
-Scheduler Activation: Vector control from the kernel to the address space thread scheduler on kernel event. It is responsible for running user-level threads in exactly same way as kernel thread does; notification of kernel event to user-level thread system; and space in kernel for saving processor context of activation's current user-level thread.
-Critical Sections: If thread is executing in critical section, thread is temporarily continued by duplicating code.

Evaluation:
The design is implemented by modifying Topaz, native OS for DEC Firefly multiprocessor workstation, and FastThreads, user-level thread package. The performance number do not clearly support the migration from existing thread implementations. Its only in I/O related applications that scheduler activations show a comparative speedup compared to FastThreads.

Concerns/Learnings:
-Can we talk in more detail about how the kernel is able to maintain the invariant that there are always right number of scheduler activations running corresponding to processors assigned to application?
-What will happen in case a malicious application forever spins in critical section, and thus doesn't relinquish control. How is the temporary time to let thread continue be decided, and correspondingly the number of copies that needs to be created?

1. summary
This paper proposes a system for user level management of threads. The approach is to provide each address space with a virtual multiprocessor (abstraction of a dedicated physical machine). Kernel allocates processor to the address space while the user level thread system is responsible for scheduling the threads on the allocated processors. Kernel notifies of every event that affect the user level thread scheduler and also address space notifies kernel of events that affect processor allocation. scheduler activations is the execution context which is used to realize these concepts.
2. Problem
Threads can be supported at either user level or kernel level. User level threads as they require no kernel intervention has better performance and its flexible(customizable to the application needs). Due to the lack of system integration, user level threads perform poorly during I/O and page faults. On the other hand kernel threads avoids the system integration issues (as kernel directly schedules them). But kernel threads have worse performance as they require additional kernel intervention for thread management operations. Also they are not flexible like user level threads as they are common to all applications. This paper tries to address this issue by building a kernel interface and user level thread package taking the best of two worlds.
3. Contributions
Vectoring of kernel events to thread scheduler -
Scheduler activations is used for communication between kernel processor allocator and user level thread system. When the kernel needs to notify a event it creates a scheduler activation assigns it a processor and upcalls into the address space. (events like adding processor, preempting a processor, scheduler activation blocked/unblocked).
Notifying kernel of user level events -
Address space notifies kernel when there are more runnable threads than processors or more processors than runnable threads. These notifications help in making the processor allocation polices efficient. To keep the user programs honest, processor allocator favors the address space with less processors.
Critical Section -
User level threads could be executing in critical sections when it is blocked or preempted. It may cause poor performance or in some cases deadlocks. A recovery based solution is proposed to solve this issue. When a upcall notifies a user level thread that a thread has been preempted or blocked. the user level system checks if it was executing in a critical section. If yes, then it is continues to run till it completes the critical section.
4. Evaluation
Implementation and evaluation of the proposed system is presented. Evaluation results proves their claim that their system doesn't add much overhead when compared between FastThreads on Topaz threads versus FastThreads on Scheduler Activations. User level thread latencies are almost similar with small increase in latencies when threads require kernel access. For a I/O heavy workloads Fast threads on Scheduler Activations out performs its counter part.
5. Confusion
Just for better clarity if you could explain the life cycle of a scheduler activation under different scenarios it would be great.

Summary:
The authors present design and implementation of a new framework which supports user level threads, but also achieving the functionality previously available only with kernel threads. The paper aims to eliminate the concept where kernel and user threads operate oblivious to each other, by introducing a scheduler activation interface to improve co-operation and better scheduling when certain even occurs.

Problem:
The core problem addressed here is that although kernel threads provide functionality, they are heavyweight and do not achieve the performance typically available with user level threads. But since user level threads do not have a global view of the system, it always leads to under-utilization of resources leaving the processor idle when some thread is blocked on I/O. The paper addresses this concern by providing a framework to achieve performance as well as flexibility, by introducing a co-operative mechanism between kernel space and user space for thread management.

Contribution:
The one-on-one mapping from user threads to processors created issues when a thread was blocked on I/O. To promote a notification system, the authors present a scheduler activation which acts like a vessel maintaining an execution context for user level threads if it is blocked. The kernel on its part creates virtual processors abstraction for user threads which it maps on physical processors. The user level threads are free to use their own scheduling policy for virtual CPUs. The scheduler activation informs the user level if there is a kernel event by an upcall, and the user thread notifies the kernel if the application needs more processors. This establishes a two-way communication between the kernel and user threads which helps to maintain the current execution state of the system across layers and use that information for better scheduling decisions.


Evaluation:
The implementation goal is achieved by making changes to Topaz OS, and another user level threads package known as Fast Threads. For regular workloads which do not interleave with the kernel, the implementation performs equally as compared to Fast threads. In case of workloads involving kernel interventions, the proposed implementation with scheduler interventions performs better than the original Fast Threads implementation.


Confusion:
With a high flow of information between the kernel and the user space, I am confused as to why it does not affect latency at all ?

Summary :
This paper addresses the dilemma of balancing the flexibility and performance of user-level threads with kernel support. It first highlights the exclusive advantages of each style of thread handling. It then proposes a variant of a user-level thread library which communicates effectively with the kernel thus providing the best of both worlds.

Problem :
The performance of kernel threads is inherently worse than user-level threads. Managing threads at the user-level is essential to high-performance computing. The lack of support from the kernel in modern multiprocessor operating systems for user-level threads is the major problem. A new kernel interface is proposed to base user-level threads on without compromising performance and flexibility advantages of user-level threads.

Contributions :
1. It is shown using empirical evidence that the lack of system integration exhibited by user-level threads is not inherent in user-level threads but is a consequence of inadequate kernel support
2. The explicit problems with pure kernel threads is the cost of accessing thread management operations and the cost of over generalization of the problem for applications which do not need all the features offered.
3. It is stated boldly that kernel threads are the wrong abstraction for supporting user-level thread management because kernel threads block, resume and are preempted without notification to the user level and they are scheduled obliviously with respect to the user-level thread state
4. The solution is to use scheduler activations. The kernel’s role is to vector events to the appropriate thread scheduler rather than to interpret these events on its own.
5. A scheduler activation notifies the user-level thread system of a kernel event and it provides space in the kernel for saving the processor context of the user-level thread when the thread is stopped by the kernel.
6. On the user side communication toward the kernel side, the user level thread system need not tell the kernel about every thread operation. Only a small subset that can affect the kernel's processor allocation decisions need to to put across. This includes just two kinds of messages. The first being the notification to add more processors. The second is to state that a processor is idle.
7. The implementation uses recovery instead of prevention to deal with preemption while a user thread is holding a lock. When an upcall informs the user-level thread system that a thread has been preempted or unblocked, the thread system checks if the thread was executing in a critical section.
8. The implementation is based a modification of Topaz which is the native operating system for the DEC SRC Firefly multiprocessor workstation and FastThreads which is a user-level thread package.
9. For processor allocation policy a space-share model that respects priorities and guarantees that no processor idles if there is work is used. The thread scheduling policy uses the default FastThreads policy of per-processor ready lists. Post-processing of compiler-generated assembly is used for detecting of lock-holding scenarios. Discarded scheduler activations are cached for future use. An interface to debug the user-level thread system and the application code running on top of the thread management code is provided.

Evaluation :
Two benchmarks, namely Null Fork and Signal Wait are used. A comparison is made with that of FastThreads and Topaz Threads. While being significantly better than kernel threads the slight degradation is performance in comparison with FastThreads can be attributed to extra checking done for critical sections, etc. For application speedup, the speedup scales with increase in the number of processors due to effective processor allocation.

Summary :
The paper discusses about the design, implementation and performance of a system that combines the kernel interface and an user level thread system to provide both the functionality of kernel level threads along with the flexibility and performance of user level threads.

Problem :
User level threads provide high performance and flexibility and kernel level threads are required to provide the functionality of processor allocation and I/O but are quite heavyweight. The problem is to combine the kernel interface with a user level thread system in such a way that the kernel can do processor allocation by accessing the user level information and the user level thread manages parallelism by getting info about the kernel events.

Contributions :
1. Scheduler activation is the mechanism used by the kernel to vector control to the address space on kernel events. On a kernel event, the kernel creates a scheduler activation and upcalls into the application address space.
2. The processor allocation to application address spaces is done by the kernel and the thread scheduling is done by the application address space.
3. Kernel events are notified to the user level thread scheduler and the address space sends information to the kernel about those events that impact the processor allocation decisions.
4. Kernel events when the kernel creates a scheduler activation, assigns it to a processor and upcalls include - change in the number of processors or when a thread is blocked in the kernel.
5. User application address space notifies the kernel when there are more runnable threads than the number of processors or more number of processors as compared to the number of threads. Use of a multilevel feedback mechanism to ensure proportional share of resources.
6. User threads are checked if they are pre-empted or blocked in a critical section. Prevents deadlocks by continuing the thread temporarily using a user level context switch until it exits the critical section.
7. Processor allocation policy space shares processors respecting priorities and guarantees that no processor is idle when there is processing to be done.

Evaluation :
The performance evaluation is done for two benchmarks - Null Fork and Signal Wait and is compared with that of FastThreads and Topaz Threads. The thread performance is better than the kernel threads but performs like the FastThreads with a slight degradation in performance due to the extra checking involved for critical sections etc. The upcall performance is worse and the signal wait time is a factor of five worse than pure kernel threads. The speedup is seen to scale with increase in the number of processors due to effective processor allocation.

Confusion :
The increase in communication between the user and kernel threads/upcalls has mainly lead to a decrease in performance at the cost of providing flexibility. So, what is the scope of applying these techniques?
How are scheduler activations debugged?

1. Summary

The paper describes a new kernel mechanism called scheduler activation that exposes the functionality of kernel-level threads at the user-level by communicating kernel events to the user level thread system.

2. Problem

User-level threads are inherently more efficient than kernel-level threads as they avoid the extra kernel trap and copy operations. Furthermore, user-level threads are also more flexible as they can be written to match the needs of the specific application (such as a custom scheduling policy). However, user-level threads suffer from poor integration with system services. For example, when a user-level thread makes a blocking I/O request, no other threads can execute on the virtual processor and thus all threads are blocked. Likewise, in a multiprogramming environment, the operating system is forced to choose kernel-level threads (and thus indirectly the corresponding user-level threads) to halt without information about the user level thread state. Thus, kernel threads are the wrong abstraction for supporting user-
level thread management.

3. Contributions

The main contribution of this paper is to introduce a construct similar to a kernel thread referred to as the scheduler activation. The main features of this are as below.

- Has two execution stacks, one mapped into the kernel and one mapped into the application address space. A user-level thread uses the user-level stack when it starts running and when it calls into the kernel, it uses the kernel-level stack.

- Upcalls are used to notify the user-level thread management system of events (personally think this is not elegant design). The different kinds of events that may be notified are 1) addition of a new processor to execute on 2) pre-emption of a processor 3) block of a thread (for e.g on I/O), 4) unblock of a thread.

- Upon receiving these upcalls, the user-level thread management system may choose to alter its scheduling policy. For example, a blocked thread may now be run or if a processor is pre-empted, the thread running on that processor may be assigned to a different scheduler activation.

Note that through the scheduler activation, the user-level thread management system has its own virtual multiprocessor with each activation stack corresponding to a processor. Thus, the user-level thread management system has control over which threads to run. However, the kernel still retains control over allocation of processors to each process.

4. Evaluation

It is observed that scheduler activation maintains the efficient performance of user-level threads. However, upcalls are noted to be slow in comparison. This is attributed to implementation details rather than a flaw inherent in scheduler activations. Finally, overall application performance is shown to be slightly better than user-level threads and much better as compared to kernel-level threads.


5. Confusion

How is it useful for the scheduler activation to store the processor context of the current user-level thread?

1.Summary
The paper discusses about the problems of user-level and kernel-level thread systems in managing the parallelism of applications in both uni-programming and multi-programming systems. They propose a novel abstraction called 'scheduler activations', the kernel can provide to user-level thread packages which has the functionality of kernel threads while maintaining the performance and flexibility of user-threads. They back their design claims with performance results of threads and applications which seem to be in same order as user-threads while eliminating the overhead of kernel intervention.

2.Problem
User-level threads are fast and have flexible concurrency model, but suffer during I/O, preemption and page-faults due to overhead of kernel trapping. Kernel-level threads are quite heavy weight in performing parallel tasks as they have to cross the protection boundaries many times and also lose on being generic for many concurrent models thus not able to match application needs. Also, user-level threads have poor system integration as they are not notified of kernel events. Authors have tried addressing these problems by getting the best of both worlds, and implementing a new kernel interface retaining kernel functionality but can perform similar to user-threads.

3.Contributions
User-level thread systems are given an abstraction of virtual multiprocessor by kernel which is responsible for allocating processors to address spaces. The main contribution is the concept of scheduler activations and upcall interface, exposing much of the low level details to user-threads. User-level thread scheduler can now make decisions of scheduling threads on allocated processor instead of kernel doing coarse-grained thread multiplexing without notifying user-threads. This in itself is a major advantage as user can exploit more parallelism now and schedule the threads. Scheduler activations hold the thread contexts using separate user and kernel stacks. Because of upcall notification by kernel on thread preemption or I/O blocking, user-level thread scheduler can manage scheduling of other ready threads, thus not sacrificing performance. User-level thread system can also notify kernel of its demands for allocating more processors and thus affect kernel allocation policy decisions. One more important contribution is the way critical sections are handled by having a copy of code and allowing preempted thread to continue execution until it releases the lock thus avoiding deadlock scenarios.

4.Evaluation
The design has been evaluated by modifying the Topaz OS in DEC SRC firefly multiprocessor. Fast threads (user-thread package) and original Topaz kernel-threads are compared to Fast-threads with Scheduler activations (FTSC) on all experiments. On two operations FTSC performed almost similar to Fast threads on thread performance, and way better than kernel-threads. For, N-body simulation with negligible I/O FTSC performed better than both Fast-threads and kernel-threads. With I/O, FTSC performed better than fast-threads due to better lock mechanism implemented. Overall, a good performance hit was achieved with almost 3x speddup.

5.Confusions
Seems similar to exo-kernel philosophy of exposing low-level decisions to applications. But, applications could always over-provision the systems and try to take over -- Will it still maintain fairness to all user programs? How do per-processor ready lists increase cache locality ?

Summary:
The paper discusses advantages and disadvantages of existing methods for managing parallelism. It describes the design, implementation and performance of a new kernel interface and user-level package that together provide the same functionality as kernel threads and same performance and flexibility as user-level threads.

Problem:
Threads were either implemented at user-level or in kernel level but both the approaches were not fully satisfactory. User-level threads were flexible and good in performance but ensuring logical correctness of a user-level thread is difficult. Also, they perform poor in case of page fault or I/O because there there was no mechanism for kernel to inform user-level scheduler. Kernel level thread scheduling had performance problem and they were not as flexible as compared to user-level threads.

Contributions:
Paper introduces a kernel mechanism Scheduler activation which notifies user-level thread scheduler of every kernel event allowing application to have complete knowledge of its scheduling state. User-level thread scheduler notifies the kernel about operations that affect processor allocation decisions. Abstraction of virtual multiprocessor is provided to each process and process can schedule its thread on these multiprocessor without intervention of kernel. Upcall is made by kernel whenever it has to change number of virtual processor or thread is preempted or blocked by I/O and user-level thread can use this information to schedule its thread. If kernel preempts thread which is in critical section, user-level thread scheduler checks if any locks are held by thread and mechanism was implemented which ensures it complete critical section.

Evaluation:
To implement the idea, authors have modified Topaz OS and user-level FastThreads. Authors compared performance of Null forks and signal waits and observed degradation of 3-5 microseconds. Performance of proposed implementation is comparable to FastThreads for workloads which need low kernel intervention. With current implementation, performance of Upcall is bad as compared to Topaz kernel thread performance, but author claims with better implementation it can outperform Topaz implementation.

Confusion:
Debugging section of paper is not that clear.

Summary:

In this paper, the authors discuss the overheads involved in parallelism when the threads are built over traditional processes or at kernel level or implementing user level threads over kernel threads. The authors describe a new approach in the form of a kernel interface with a user level thread package. The authors also discuss its implementation and compare its performance with Topaz system and Fast Threads.

Problem:

The authors try to solve the problem posed by user level threads and kernel threads. Though user level threads are flexible, their performance is deterred in the presence of page faults, I/O and multiprogramming. Though kernel threads avoid these problems, there are more rigid and incur extra cost during thread operation. Also they are generalized for all applications. The paper tries to come up with an approach in which the kernel can access the user level scheduling information and notify the user level thread manager about kernel events to manage parallelism better.

Contribution:

The contribution is the idea of trying to combine the scheduling information at user level with kernel operations. In this approach, the authors describe the mechanism of scheduler activation that is used to direct control to thread scheduler in address space from kernel during kernel events. In the proposed system, the kernel allocates virtual processors to address spaces retaining full control over its allocation policy. The address space’s user level thread system has control over the threads and processes in the system. The kernel notifies the thread system during kernel events and the thread system notifies the kernel when it needs more processors. The scheduler activation serves as an execution context for the user level threads. For vectoring control to thread system, the kernel creates a new scheduler activation and upcalls the thread system scheduler which runs in this context and starts a thread in ready state. By trying to penalize the address spaces that use more processes any gaming intention of address spaces is discouraged. The performance overhead on preemption of a thread in critical section is avoided by checking if a preempted thread is in the critical section and executing it in the context of the new scheduler activation till it comes out of the critical section so that the locks held by it are released.

Evaluation:

The implementation was tested over Null Fork and Signal-Wait benchmarks. On cases when there is little kernel intervention, the performance is similar to the FastThreads but a little extra cost is incurred due to incrementing and decrementing the number of busy threads and kernel notification decision. The performance is better than Topaz threads and original FastThreads (which degrades quickly due to blockage of kernel level thread when its corresponding user level thread blocks) when there is kernel involvement (for I/O). Testing the performance on the N-Body application also had favorable results for modified FastThreads.

Confused about:

I am not sure what the author means by system integration problems exhibited by user level threads.

Summary:
This paper posits 'scheduler activations' as a new mechanism for supporting user-level thread management effectively.

Motivation/Problem:
The authors feel that the existing thread management systems at kernel level are too heavy-weight and those at user-level, while more efficient, suffer from poor system integration and that there is no effective communication between the kernel and these thread management systems regarding events that impact (/ should impact) resource allocation. They state that this is because kernel level threads are not the correct abstraction for allowing user level parallelism, and hence come up with their idea of scheduler activations.

Contributions:
The important ideas presented in this paper are:
- Scheduler activations are upcall mechanisms that the kernel can fire when events that can impact the thread's scheduling policies (such as a thread blocking/unblocking, a processor being taken away from the process/address space etc.) occur.
- There are also mechanisms provided for the user level scheduler to inform the kernel in cases of any resource related requests.
- The kernel allocates 'virtual processors' across address spaces and from then on scheduling of individual threads in the address spaces is handled by the user level thread schedulers.
Such an interface allows a full-fledged (and correct) scheduling mechanisms for the threads to be run on user level.The authors also talk about other performance enhancements that they designed, such as having multiple copies of critical section code which allows the thread scheduler to temporarily run a preempted critical section, complete its execution and resume its scheduling work (so as to prevent other threads from waiting on a lock held by the preempted thread) and reusing the scheduler activation data structures.

Evaluation:
The performance of a single 'null' thread with scheduler activations is shown to be on the same order as in existing user level thread management systems. Also the measurement shows that upcalls are considerably slow as a mechanism than we can expect, but the authors say that this is more a result of their implementation constraints ( they reused existing kernel thread code rather than develop from scratch) than any inherent issue with the upcall design. With respect to general application performance, the scheduler activations version of thread management outperforms both kernel level threads and user level thread scheduling systems (the latter while performing relatively closely in non-IO intensive cases, degrades much more quickly with increase in i/o blocking, as in the version of the system that they were using, once a user thread blocks, the corresponding kernel thread was also blocked, rendering the physical processor unusable for the address space till unblocking)

Confusions:
Why do the authors say that passing physical processor related information in scheduler activations to the user level schedulers during debugging is inappropriate?
The authors talk about managing critical sections that are part of the thread management code by generating two copies using assembler labels, but what about critical sections that are part of the (application) thread code itself? Are they also handled without the presence of a flag? Are the programmers who use the system expected to use the same assembler labels around their critical sections?

1. Summary
This paper argues that user-level threads have much better performance than kernel threads, user-level parallelism management is essential for good performance, but a lack of kernel support for system integration is making it difficult for user-level threads to reach full functionalities. The paper presents schedular activations, a new set of kernel interfaces and user-level packages that address these multiprogramming issues.

2. Problem


  1. The performance bottleneck of parallel computing lays at the cost of parallelism management.

  2. Using threads supported at user level is flexible enough and the cost is no more than a procedure call; but in the presense of operations such as I/O, multiprogramming and page faults, this model exhibits poor performance or incorrect behaviour.

  3. Using threads supported at kernel seems natural, and avoids the system integration problems; but the performance is much worse than using user-level threads, and it is less flexible as well.

3. Contributions
This paper presents its own idea that combines the advantage of user-level threads and kernel threads. It preserves the functionality of kernel threads and also the good performance and flexibility of user-level threads.

  1. In most cases, it is not necessary to take a page fault or use I/O, so the thread operations do not need kernel intervention and could remain good performance at user level.

  2. When the kernel must be involved, the threading model in this paper could make use of a scheduler activation, and guarantee no processor is idle in the presence of ready threads. The scheduler activation would take the processor from the thread trapping to the kernel to block, and run other threads with it.

4. Evaluation
This paper compares the performance of threads by different number of processors and available memory. It is obvious that the new FastThrds speedup the most, and the execution time is the shortest comparing to Topaz threads or original FastThrds.


5. Confusion


  1. What is the threading model we are currently using in modern operating systems like Linux and Windows and Mac OS? Is any of the ideas from this paper adopted?

  2. A thread’s register state is said to be saved by low-level kernel routines when the thread blocks the kernel. Does it affect the performance?

1. Summary
In this paper, the authors explored existing issues in user-level parallelism, argued that user-level thread and kernel thread model are not satisfactory, designed and implemented a new kernel interface called scheduler activations to boost thread-level parallelism by providing virtualized multiprocessor to application threads.

2. Problem
User-level threads managed by user thread package provide flexibility and better performance if no kernel intervention is involved. However, in scenarios of heavy kernel dependencies such as multiprogramming scheduling and I/O intensive workloads, the performance degrades significantly.
The traditional kernel thread interface, on the other hand, provides full functionality to the thread, but has poor performance due to heavy-weighted process implementation and self-protection mechanisms. The challenge arises to combine flexibility and performance of user-level thread and full functionality of kernel thread.

3. Contributions
Before proposing their design, the authors described where the poor performance comes from in existing mechanisms in great detail. In summary, they found performance for user-level thread is inherently better than kernel thread due to cheaper user-level operations and lighted-weighted and customized feature support. They found that traditional kernel interface for thread failed to support user threads in implementations, as there is no communication/cooperation between user threads and kernel threads. Therefore they decided to design a new kernel thread interface that is suitable for user level threads running on top of it.

For each application in an address space, a virtualized multiprocessor is assigned with a certain number of physical processors allocated by the kernel. And user level thread manages could schedule its threads without intervention of the kernel. The kernel notifies user level thread manager on events blocking, preempting and changing number of physical kernels, which facilitate user thread manager to better schedule threads using kernel information. The user level system, in return, notifies the kernel when requesting/relinquishing the physical processors, or thread operations that affect processor allocation, which helps kernel to better allocation processor resources.

The mechanism for communication between kernel/user thread notifications is scheduler activation. Each scheduler activation has two execution stacks, one in user space and one in kernel space, running user thread and kernel thread respectively. The scheduler activation is corresponding to a physical processor. The kernel is responsible for creating a scheduler activation and upcalling into application address space. And the user thread scheduler selects one thread running in that activation. Notifications of events such as blocking, preempting are delivered to application address space in the same way. The user level thread system, in return, notifies the kernel on events affecting processor allocation. The user level thread manager only send the information about those threads affecting allocation decisions such as requesting a new activation or reporting an idle processor. This ensures running threads are always equal to allocated scheduler activations. Critical sections are dealt with recovery mechanism such that the thread may continue execution upon kernel-requested context switch until it exits the critical section and return the control to upcall from the kernel.

4. Evaluation
The evaluations are done DEC SRC Firefly running Topaz operating system and FastThreads user-level thread package. Particularly, cost of user-level operations, cost of communication between kernel and thread system, and overall performance is measured among Topaz traditional threads, FastThreads original threads, and the modified FastThreads. The thread performance (user-level operations) of scheduler activation is the same order of magnitude as original FastThreads. The upcall performance is less outstanding than expectation; it is actually 5 times slower than original FastThreads, which the authors thought are due to specific implementations. The overall performance is tested on N-body problem in which the memory size can be adjusted. When the problem is compute bound with sufficient memory and uniprogramming, new FastThreads is slightly better than original FastThreads, but the explanation is less convincing to me. When this problem is I/O bound, modified FastThreads is better than the other two. In multiprogramming environment, modified FastThreads also performs the best. I think authors should include more workloads in evaluating overall application performances.

5. Confusion
1. Why isn’t there a scheduler activation transfer operation that directly transfers one activation from one thread to another? Destroying old activation and creating a new one seem to be time consuming.
2. The paper mentioned traditional kernel thread is a wrong abstraction. Why was such a wrong abstraction designed and implemented? What is the history behind it? Should the inventers developing traditional kernel thread interface be responsible for their mistake?

Summary
User level threads have better performance and are more flexible. Kernel threads are less restricted when accessing kernel services. The paper describes an approach to achieve high performance parallel computing by combining the benefits of both kernel-level and user-level threads into a single kernel interface. It also outlines the design and implementation details of a new hybrid thread interface exposed to the programmers, without changing the original interface .
Problem
Parallel programmers had to decide whether to employ user-level threads or kernel threads. User level threads have better performance because they can be managed without kernel intervention and are more flexible as they can be customized to suit the needs of the user without kernel modification. But, they perform poorly during I/O, page faults and in case of multiprogramming, as they cause entire process (hence, sibling threads) to block. On the other hand, kernel-level threads exhibit poor performance, due to extra kernel trap and validation, and lack of flexibility, as it uses generic scheduling algorithm. Hence, the authors wanted to create an interface that effectively combines the functionality of kernel threads with the performance and flexibility of user-level threads to achieve high performance.
Contribution
A major contribution of this paper is the concept of scheduler activations. These activations allow effective communication between kernel and user level thread schedulers. This allows threads to operate and switch at the user-level, and only invoke the kernel when needed. As a result the kernel has no knowledge of policies or data structures used at the user-level, and hence changes can be made to fine tune performance to suit application needs. The paper also presents a good solution to the scenario when a user-level thread might be preempted while executing in a critical section, in which case it creates a duplicate code section of a critical section, and continues the thread at the corresponding place in the copy of the critical section.
Evaluation
The authors evaluated the performance by measuring the time taken for user level thread operations in new system (fork,block and yield); cost of communication between the kernel and user specifically the upcalls; and finally overall effect on the performance of applications. When the application makes minimal use of kernel service, both the user-level threads( Fast threads on scheduler activations and TopazThreads) have similar performance and are better than kernel threads. As the number of processors were increased, the performance both user-level threads increases, but the performance of Topaz threads initially increases and then flattens out due to lock overhead. Even when the application involved I/O the results demonstrated that the performance is close to a pure user-level thread system.
Confusion
Give that this implementation requires changes to both the kernel and user-space code and that the performance is almost the same as that of user level threads..how justifiable is this new interface ?

1. Summary

Instead of providing threads at a kernel level, they present a method of using scheduler activations to provide the necessary kernel abstractions to implement an effective and correct usermode thread library. Using this, they manage to achieve similar performance and flexibility to an existing usermode thread library. while offering the same functionality of kernel threads.

2. Problem

Usermode thread libraries offer better performance and flexibility, but often don't correctness issues when dealing with kernel resources like IO. Furthermore, since they are built on top of kernel threads, which do not inform the application about scheduling events, suboptimal performance can result.


3. Contributions

They provide to the process the abstraction of a virtual multiprocessor, the scheduling of which it can control. The number of processors on this virtual multiprocessor in essence represents the number of concurrently running threads the address space may have. The kernel is responsible for deciding this number, but the process can provide requests for more or fewer processors. Via this abstraction, the usermode scheduler can make scheduling decisions based on information similar to what the kernel mode scheduler would have.

To achieve this, we have the mechanism of scheduler activation. Instead of directly running a thread, every time the kernel decides to a process an opportunity to run one of it's virtual processors, it creates a scheduler activation, and upcalls into the process, at this point, the process decides which of the user mode threads to run, and the thread then runs on the activation until it stopped. This framework is also used to notify the program about scheduling events, on an event an activation is created and given to the process, which then uses this activation to continue execution, of a thread of it's choice.

In particular, the kernel is not aware of critical sections or any synchronization primitives. Instead the scheduler in usermode is aware of when its threads are in critical sections, and if a thread in a critical section gets preempted or otherwise evicted from its activation, the scheduler switches to the thread in the critical section until the end of the critical section.

4. Evaluation

In the case of simple threading operations, like thread creation and waiting, they show a latency ~10 times better than kernel mode threads, and entirely comparable to usermode threads, with slow downs relative to a pure usermode implementation on the order of 5 nanoseconds. These tight margins are to be expected since their implementation is based on the usermode thread library that they are testing against, so these two share much of the code.

More interesting perhaps is the fact that upcall performance is slow. While not as common of an operation, the upcall still represents an important part of the implementation which will likely affect real workloads, since it will add overhead to every preemption, and block/unlocking of a thread.

Then they tested N-Body to see performance in a full application. Of note is that their implementation and the original usermode threading library scale nearly perfectly while the scaling of kernel threads is very much imperfect and seems to plateau after 4 processors. On the other hand, the N-body test is in many respects a best case test for usermode threads of any sort, since almost all the work is computational, there is little concern with I/O or any other kernel level resource.

5. Confusion

There is this bit where a scheduler activation is created to notify the scheduler that an activation has been stopped, so that the scheduler can remove the relevant thread from the activation. On this, as written it appears it is the responsibility of the user program to inform the kernel that the activation can be reused, what happens if the application fails to respond to the kernel?

Summary:
This paper introduces a trade-off for efficient parallelism, it combines the good performance and flexibility of user-level threads and the efficiency of I/O and page fault in kernel-level based threads. The main result is scheduled activation, something similar with kernel that make communication between user-level threads and kernel. The evaluation result of prototype system is presented.

2. Problem
Previously there are two main stream work for parallelism.
One is use only user-level threads (user code and linked libraries). The user-level scheduler views each process as a virtual processor. However, the threads usually call kernel functions, it will make the processors not behave as planned. -> poor performance in this case.
Another is user-level threads based on kernel-level threads. However, the kernel-level threads has large overhead (an order of magnitude slower), and is not flexible because it know nothing about the user-level threads it is responsible for.

Contribution:
1. The paper analyzes the overhead for kernel threads, due to accessing thread management and cost of generality. Then it argues that pure user-level threads is better than kernel-thread based. It further argues that a kernel supported user-level threads system is better. Events like kernel threads block/resume and user thread state should be known by both kernel and user-level thread.

2. The authors design a new kernel interface for user-level threads. Kernel only allocates/controls #processors for each address space. The user thread system in each address space
controls the thread scheduling. Kernel tells events to the user threads and user threads tell kernel the allocation/related operations. The key to this communication is scheduler activation, which contains two execution stacks: one for kernel and another for an user address space.

Evaluation:
Based on the idea above, the authors implement a prototype system on DEC SRE. The baselines are Topaz (kernel threads) and FastThreads (user threads running on top of kernel threads).
The result is this paper's system and FastThreads both outperforms Topaz by an order of magnitude.
This paper's system outperforms FastThreads slightly (but consistently).

Confusion:
Is the overhead of creating and context switching scheduler activations large?

Summary: This paper introduces the design of a user-level threading system with the support of kernel interfaces. This threading system is fast and flexible.

Problem: There were two kinds of threading systems before this paper was written: kernel threads and user-level threads. Kernel threads requires user to invoke a system call for every threading operation. The drawback of this approach is performance: the authors showed that the overhead of making a procedure call is by an order of magnitude greater than the actual threading operation. The other approach is to use a user-level thread library. This was much faster, but it cannot handle IO operations and interrupts correctly because the kernel does not know the necessary information about threads.

Contribution:
They combined user-level thread library with kernelized threads: keep thread support routines inside each process, but let kernel provide certain interfaces to each user process so that kernel will do the right thing when IO happens. More specifically, the user-level thread system will see one or more virtual processors provided by kernel. The user-level thread library is responsible for scheduling threads. The kernel will make upcalls to a process when the number of virtual processes changes, when the current thread is preempted, or a thread is blocked or unblocked due to IO. With this information, the thread library can schedule threads correctly. The thread system will also make calls to inform the kernel when a thread requested more virtual processors, or certain virtual processor is idle.

Evaluation:
The authors evaluated the performance of their system. They demonstrated that the performance is close to a pure user-level thread system.

Confusions:
1. It appears to me that depending on the process, not every thread in the system is equal. For example, process A owns 100 active threads, and process B has only 1 thread. If processes are scheduled round-robin, then each thread in A will get 1/200 of the total CPU time, but the thread in B will get 1/2 of the CPU time. Is it true?

2. How does Linux handle threads?

3. How do user-level thread system handle locks shared by two processes?

1. Summary
The author describes the design and implementation details of a user level thread package and replacement kernel interface. They claim that it provides all of the functionality of standard kernel level threads without the drawbacks related to performance and flexibility the latter causes.
2. Problem
According to the author, threads managed by the kernel are incurably slow by nature of the way they work. Additionally, existing problems with integrating user level threads with kernel functionality are not unsolvable, but rather due to a lack of kernel support.
3. Contributions
The authors propose and implement a system where there are clearly defined boundaries between the tasks assigned to the kernel and userlevel thread libraries. The kernel takes responsibility for notifying userlevel programs when events like IO request or processor availability/reclamation occur. The userlevel program is then responsible for actually scheduling threads on its available processors.
Interactions between the kernel and programs take place in events called scheduler activations. The aforementioned notifications from the kernel to userlevel programs take place in scheduler activations, which vector control from the kernel into the userlevel thread system. From there the thread scheduler can begin running the main application thread.
Additionally, scheduler activations can be user by userlevel programs to notify the kernel that a thread has gone idle or that a program would benefit from more processing resources.
4. Evaluation
The authors tested their implementation by modifying the Topaz OS and the userlevel FastThreads library. Compared to the unmodified FastThreads library, performance was similar when minimal kernel intervention was required by the workload. Under memory pressure and paging IO though, they observed a considerable speedup for FastThreads with scheduler activation.
Compared to stock kernel threads the performance gain was much greater, especially for situations involving 3+ processors.
5. Confusion
I could use more explanation of the mechanism used to safely manage preemption in critical sections.

Summary:
The authors describe the design, implementation and evaluation of a novel approach to implement user level threads library and the kernel interface for it. Also, an overview of the advantages and issues with the then existing methods of parallelism using threads, which were - kernel threads, user level libraries and user level libraries implemented over kernel level threads, is presented to provide the motivation behind the work.

Problem:
The existing methods for thread level parallelism were either prone to errors or were too heavyweight to support fine-grained parallelism. For example, user level thread libraries without kernel level support either performed poorly or wrongly, due to interrupts, multiprogramming or I/O. The kernel level thread management systems and the user level thread management implemented over kernel threads had heavy overheads due to protection level crossing and copying of data between user and kernel levels which made it unfit or inefficient for fine- grained tasks.

Contributions:
The paper presents a good example of a systems research project where the designers used the existing approaches, identified the drawbacks and advantages of those, and designed a system that included the best of the exiting designs. The prime contribution of the paper is the concept of scheduler activations. One of the biggest shortcomings of the kernel level thread management was that it was oblivious to user level thread state. Hence by using scheduler activations to notify the user space of a kernel event like thread blocking or preemption, the kernel lets user level thread management choose the following action, hence providing flexibility. New scheduler activation was created for each notification to the user level, which was achieved using an upcall. The scheduler had two execution stacks, kernel and user level, the kernel stack being used when the thread made a call into the kernel. Also, the scheduler activation had a control block to save the thread context upon preemption or a block. The most creative solution provided in this paper is for handling preemption in case the thread’s in a critical section. A copy of the critical section was executed when the thread in its critical section was preempted, which yields to the upcall instead of the instruction following the thread.

Evaluation:
The authors present the evaluations for null fork, signal and wait which tests the overheads of the proposed approach against the exiting implementations. In this case, the scheduler activation approach has a slightly higher latency due to checks for critical sections and the policy of notifying kernel of number of threads. The performance was evaluated for different number of processors, varying memory availability and in a multiprogramming environment, and the scheduler activation method achieves superior performance in all the cases, achieving a speedup of 2.45 for a multiprogramming environment as compared a uniprogrammed one.

Confusions:
Can we discuss how an upcall to notify the program of a page fault can fault on the same location? How is the cache locality exploited using scheduler activations?

Summary:
This paper argues that the performance of kernel threads is worse than that of user-level threads and managing parallelism at the user level is essential to high-performance parallel computing. They describe the design, implementation and performance of a new kernel support for user-level threads.

Problem:
Threads can be supported either at user level or in the kernel but neither approach has been fully satisfactory. User-level threads are more flexible and have theoretical high performance. But if built on top of traditional processes they can exhibit poor performance. Kernel threads avoid the system integration problems but are too heavyweight.

Contribution:
(1) The paper analyzed the advantage of user-level threads over kernel threads. Kernel threads have inherent cost of accessing thread management and cost of generality. The paper also argues that the difficulty of system integration is due to the lack of kernel support. Kernel threads block/resume and are preempted without notification to the user level. They are also scheduled obliviously with respect to the user-level thread state.
(2) The authors design a new kernel interface for user-level thread. The os kernel provides each user-level thread with its own virtual multiprocessor. The kernel allocates/controls processor numbers. Each address space's user-level thread system has control over thread scheduling. Kernel vectors events to the appropriate thread scheduler and the user-level thread system notifies kernel about process allocation related thread operations. Kernel processor allocator and the user-level thread system communicate via scheduler activation, which contains two execution stacks, for the kernel and application address space respectively. The critical section is handled by recovery.
(3) The author implements a prototype system based on Topaz and FastThreads. Some optimization techinique is used, such as copy of every low-level critical section, cache/reuse/bulk return of scheduler activations.

Evaluation:
The paper gives evaluation for the system by giving the cost of user-level thread operation, the cost of communication between the kernel and the user level and the effect on performance of applications. The thread operation latencies for the prototype system, original FastThreads, Topaz kernel threads and Ultrix processes show that the prototype system only leads to very small (3-5 usec) performance degradation relative to original FastThreads. For the same parallel workload, the implementation shows the similar scalability as the original FastThreads. The original FastThreads degrades more quickly when the amount of available memory decreases

Confusion:
On page 64, the paper mentions that an upcall to notify the program of a page fault may in turn page fault on the same location. Is there an example?

1. Summary
The paper explains the working of Schedular Activations which is a hybrid between user-level thread scheduling and kernel-level threads’ system integration thus obtaining the best of both worlds. This implementation in turn reduces kernel involvement and context-switching while also scheduling efficiently as the user-level process has a better knowledge if the working state of the threads (critical section, locks).

2. Problem
a) User-level thread libraries that are built to run on kernels without any modifications have the limitation that the thread package assumes that the process runs on a virtual processor thus exhibiting poor performance in case of I/O, multiprogramming, page faults etc. as kernel just reschedules the thread that is running.
b) Kernel threads on the other hand are too heavy-weight to use. They are inherently slow as context-switches are frequent in a multi-threaded environment. In addition to this, the kernel thread scheduling could be implemented only in a generic manner thus being inflexible to specific applications.

Thus the authors would like to have a combination of both these threads by modifying the kernel and building a user-level thread library that runs in tandem with it.

3. Contributions
a) The kernel assigns a few processors to the user-level process that runs its implementation of threads. In case the kernel decides to add/remove processors to the process, it does this through schedular activation and notifies the thread schedular of the process by an upcall, thus facilitating the process to change its scheduling if needed.
b) While doing the upcall, the user-level process can decide to preempt a particular thread on a particular process based on the priority. The schedular activation go from the process to the kernel when the process becomes empty or hits a page fault thus eventually relinquishing the processor to another process.
c) In critical sections, the authors use the idea of recovery instead of prevention. In prevention, the thread in the critical section and its corresponding pages are pinned to memory thus delaying a context switch completely. In recovery, the kernel shifts this problem to the user-level process where the process is responsible for descheduling the as soon as it completes its critical section.
d) As an optimisation, a copy of the critical section is made during compile time with a yield after the critical section that gets called in case of a preemption inside a critical section as opposed to tracking critical sections with a flag.

4. Evaluation
The authors implement the schedular activation by modifying the Topaz kernel. Forking a new thread is found to be faster when compared to just user-level threads or kernel level threads. The optical performance is worse than just a kernel-level thread library implementation because the authors implemented their scheme without making a lot of modifications to the existing OS. They argue that given a chance to modify the OS, schedular activation will have the same optical performance. Multithreaded applications are found to have speedups more than user-level threads and faster execution times irrespective of the amount of available memory.

5. Confusion
The debugging portion is confusing. What do the authors mean by assigning a logical processor?

Summary
This paper first discusses the two approaches for implementing parallelism - kernel threads and user threads. User-level threads have better performance than kernel threads but they still face multiple overheads if it is implemented over the traditional kernel interface. The authors implement a new communication interface between the kernel processor and the user-level which increases the performance of the user level threads.

Problem
User-Level threads even though are flexible and have good performance, but they are still implemented over the underlying kernel without any modifications to it causing performance degradation. Each process is seen as a virtual processor, which are multiplexed by the kernel to the hardware processors. The virtual processor directly executes thread without kernel intervention. However, in situations like page fault and multiprogram, IO - this performance advantage cease to exist.
In case of kernel threads, system integration issues do not arise as threads are directly scheduled by the kernel to the actual hardware. But this mechanism was very heavy leading to degraded performance. User threads built on top kernel threads also faced these issues.

Contribution
The main idea was sharing of information between the kernel and the user level thread system. Each process is assigned with a virtual processor and the application is responsible for assigning threads to it. The responsibilities are shared by the kernel and the address space thread system. The information provided to each other is used for efficient allocation and scheduling for improved performance.The kernel is tasked with allocating processors to the application. The kernel informs the user level thread scheduler of the kernel events thereby the application has complete information about its scheduling state. The thread system informs the kernel of the user level thread operations like when it needs fewer processors, which is taken into consideration by the kernel while making processor allocation decision. The communication between the kernel and user level thread system is handled via scheduler activation which is created by the kernel and assigned to a processor. It also provides space in kernel for saving processor context when the thread is stopped by the kernel. This concept also incorporates a deadlock free recovery technique to handle threads preempted while executing critical section.

Evaluation
This concept was built over Topaz kernel threads using modified FastThreads and then comparing its performance with native FastThreads. In case of workload involving low kernel intervention, modified and native FastThreads performance was comparable but they were definitely better than Topaz kernel threads. However when workload needed kernel interventions or even in a multiprogrammed environment, the modified FastThreads performed better than the native ones.

Confusions
I am not very clear on the recovery technique adopted to handle preemption of threads executing in their critical sections.
Also the thread scheduling policy mentions that it used hysteresis by making idle processors spin for short period before notifying the kernel. I was not able to completely understand that.

Summary
This paper introduces the concept of scheduler activations to build an interface between kernel and user-level threads to attain concurrent processing. Combining the functionality of kernel threads with performance and flexibility of user-level threads is most desirable for applications in user-level as it reduces kernel intervention and also overhead of context switches without having the need to modify the application itself.

Problem
The main problem posed in the paper is that there is no easy way to manage and define user level threads (with their own concurrency model) suited for an application without kernel intervention. User-level threads without doubt perform well and are more flexible than kernel threads. But their performance is limited by the OS activity. Kernel can schedule/abort user-level threads without notifying the user-level. Kernel threads perform poorly and offer no flexibility per application.
Another idea of using kernel threads in user-level will have poor performance. Maintaining the state of each user-level thread in kernel increases overhead and is not flexible. User-level threads built on top of kernel threads at present although a good solution suffers from various drawbacks. Kernel threads are blocked, resumed and preempted without notification to user-level. Also the control and scheduling information is distributed between kernel and each application's address space. This paper tried to address some of the issues by introducing scheduler activations.

Contribution
In order to address the above problems, scheduler activation serves as a mechanism that facilitates the exchange of information between user-level and kernel-level threads. They also notify the user-level thread system of a kernel event. And lastly provide space in the kernel for saving the processor context of the current user-level thread, when interrupted.

Processors are allocated to processes (address spaces) by kernel. A user-level thread scheduler controls which threads run on a process's allocated processors. The user-level notifies the kernel of changing the demand for processors. The kernel too notifies (using upcall) the user-level schedulers of system events that affect the job. These upcall occur when a processor s added, a processor has been preempted, and when a scheduler activation is blocked/unblocked. Application programs can remain unmodified and benefit from this model.

Some concerns have been highlighted by the authors. The model also deals with some of these such as what threads are preempted to notify user-level when a processor is added or removed, when the thread scheduler is itself preempted, how are priorities for each thread handled ensuring now processor remains idle or no threads need to wait when there are available processors, preemption of a thread executing in critical section.

Evaluation Apart from measuring the application performance, evaluations are done to estimate the cost of user-level operations and that of communication between kernel and user-level. The results are compared over kernel threads, user-level threads build on kernel threads and on kernel threads with scheduler activations. Kernel threads are obviously slowest. But a slight performance degradation was observed for the new model over plain user-level threads (author claim this is due to extra check to keep track of no, of available processors etc.) Upcall performance was worst in new model and the reason is not very clear to authors. To measure parallelism in application level, highest speedup was observed in modified model with slight variations. In case of I/O interruptions, original user-level threads perform worst because kernel threads kill a virtual processors every time a user-level thread traps.

Confusions I see that every event in kernel has to be notified to user-level and a subset of events in user-level are notified to kernel. But how does this model improve performance over purely kernel level threads. Isn't the overhead of interrupts and notifying a lot? The author does talk about optimizing by reducing some of these communications but I don't see a benefit overall. Also I am not convinced by the approach taken to ensure security. It might be really stupid to ask but what exactly happens in present day to handle such problems in scheduling and controlling user-level threads ?

1. Summary
This paper outlines a hybrid approach to obtaining efficient parallelism: combining kernel-level data support and user-level threads. The main concepts of this design and presented, and several optimizations are mentioned.

2. Problem
Before this work, there were two different ways to obtain parallelism (via threads). User-level threads were originally implemented exclusively by user-code via linked libraries. The user-level scheduler viewed each process as a virtual processor, but this processor could be scheduled across multiple physical processors and not behave exactly as planned. This led to poor performance, so user-level threads were then implemented using kernel-level threads. However, these threads have are slower by an order of magnitude due to overhead, and lack flexibility due to lack of knowledge about user-level information.

3. Contributions
The largest contribution of the paper is the improved communication (and well-defined separation) between user-level threads and the kernel.
This is accomplished through a kernel implementation called scheduler activations, which are similar to previous data structures for kernel-level threads. These activations consists of a kernel-mapped execution stack and a application-mapped execution stack. Starting a program leads the kernel to create a scheduler activation, which upcalls to the application to start user-level scheduling. Any time a user creates another thread, a new scheduler activation is created; and anytime a kernel needs to communicate to user-level, it will create a new activation and upcall to user-level. These activations are so important because they guarantee (via the kernel) that the number of activations is equal to the number of processors assigned to the application. This leads to improved efficiency, where previously blocking operations could lead to wasted processor time.
A second major contribution of the paper is the handling of critical sections. If the kernel needs to preempt a thread, the user-level thread manager checks if any locks are held (by the thread to be pre-empted). If so, a user-level context switch will be called which finishes the critical section and returns control via a context switch.

4. Evaluation
The authors provide a number of results which prove their hybrid approach is efficient. Null forks and signal waits (which require no kernel involvement) take only 3-5 microseconds longer than the original FastThreads, due to minor additional overhead. They do find that their zero-overhead lock checking saves about 10 microseconds from each execution. A number of other examples show that as the number of processors increase, the new implementation scales just as well as the original FastThreads. They also show that the original FastThreads slows down significantly when the amount of available memory decreases; this is due to the issue mentioned earlier, in which a process blocking for I/O means the entire physical processor is temporarily useless. The results prove that the authors have succeeded in obtaining the performance of user-level threads while retaining some necessary kernel involvement.

5. Confusion
In the results, the authors mention that their upcall performance is particularly bad due to implementation factors. How does their performance compare favorably to the original FastThreads even with this latency?

1. Summary
This is a paper introduces the approach that addresses the dilemma of manage user and kernel level threads, which achieves both performance and flexibility.

2. Problems
Parallelism management is critical in high performance. However the traditional approach of sharing memory is designed for uniprocessor and inefficient for general-purpose parallel programming. Thread is the solution. It can be supported in either user or kernel level but neither approach is fully satisfying. While user-level threads are flexible, when they are built on top of traditional processes can exhibit poor performance; kernel threads avoid the system integration problems but its heavyweight impairs performance too. To address such dilemma, the author introduces a kernel interface and a user-level thread package to combine their performance and flexibility.

Problems in the design include 1.A user-level thread could be executing in a critical section at the instant when it is blocked or preempted. This is fixed with a solution based on recovery. 2. Applications may bot be honest in reporting their parallelism to the operating leads to unfair use of resources. They will bring poor performance and deadlock, which act against the purpose of the design.

3. Contribution
The contribution includes: Generally, a new kernel interface and user-level thread system that together combine the functionality of kernel threads with the performance and flexibility of user-level threads. “Scheduler activation” data structure which has two execution stacks, one mapped into the kernel and the other mapped into the application address space. The mechanism of using this data structure to support the design of combining kernel interface and user-level threads system; An implementation by modifying Topaz; proper process and thread allocate policy and related enhancement of performance.
Analysis of performance and the conclusion of critical cost in performance.

4. Evaluation
There are evaluations on thread, upcall and application performance. The results show that thread come on user-level maintains the same; user-level threads has significant advantages on latency compared to kernel threads; upcall performance would be commensurate with Topaz kernel thread performance if tuned; the application performance is actually worse in the modified system due to the limitation of numbers of processors on Firefly.

5. Confusions
How to decide use a kernel or user level thread? Due to the program written or the system itself?
What is an upcall?
Why just test null fork and signal-wait threads?

Paper: Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism

Summary
The advantages and drawbacks of user-level thread and kernel level threads are discussed and motivated by these, the author describes the design, implementation and performance of Scheduler Activations, an efficient thread mechanism by providing kernel interface and a user-level thread package that together combine the functionality of kernel threads with performance and flexibility of user-level threads.

Problem
Although the user-level threads provide excellent performance due to no kernel intervention, flexible as they can be customized to the needs of language or user without kernel modification, fast thread context switching, yet they perform poorly during page faults, I/O or in case of multiprogramming as they can cause entire process to block. On the other hand, kernel threads have an advantage to avoid system integration problems exhibited by user level threads as they directly schedule application threads on physical processors but they are too heavyweight and perform poorly due to extra kernel trap and validation. Also, they are not flexible as they use generic scheduling algorithm. Hence, the author proposes a hybrid approach by combining functionality of kernel thread with performance, flexibility of user-level threads.

Contributions
The concept of Scheduler Activation makes an improvement on thread management systems dealing with issues related to inefficient parallel programming. A scheduler activation is the execution context for vectoring control from the kernel to the address space on a kernel event. This hybrid thread mechanism gives flexibility to build any concurrency model on top of scheduler activation and lets application make thread scheduling decisions. This policy/mechanism separation helps in achieving flexibility over kernel threads which was using generic scheduling algorithm. The paper also presents solution to problem when user-level thread is preempted while executing in critical section - by creating a duplicate code section of critical section and resumes the thread at the corresponding place in the copy of critical section.

Evaluations
The authors implemented a prototype of their design on the DEC SRC Firefly multiprocessor workstation and presented their performance evaluation on three main points: 1) Cost of user level thread operations in the model 2) cost of communication b/w kernel and user level system and 3) overall effect on application performance. Evaluation results show that the cost of user-level thread in new system is essentially same as those of Fast-Threads (before the changes) whereas performance for Upcall in their system was worse than Topaz threads by factor of five due to overhead added by scheduler activations machinery. Results also show that without too much I/O scheduler is way better than other two approach but in case of I/O its better and can be further improved.

Confusion
Can you please explain the debugging part of the paper? Also, won’t the scheduler activation be more costly in case of severe page faults or I/O?

Summary:

The paper provides a new design of parallelism using user level threading which together with a kernel interface improves the performance as compared to kernel threads and also provide all functionalities as provided by kernel threads. Use of user level threading system also provides flexibility to applications on thread scheduling and certain other aspects. A technique called ‘Scheduler Activation’ is used as a means of communicating kernel events to user level thread system as well as for certain other purposes.

Problem:

Currently available mechanisms for parallelism make use of either kernel level threading or user level threading. The problem with the former being the performance, as every time a different thread is to be schedule an expensive context switch takes place. The problem with the later is that it doesn’t provide all the functionalities that can be provided with kernel level threading in case of events like page faults, one would like to deschedule the user thread. The paper solves this problem providing both the performance and the functionality required.

Contributions:

The paper introduces the concept of user level thread system with a way for kernel intervention which is required in certain cases. The operating system kernel provides each user level thread system (or an address space) its own virtual processor and this can be increased or decreased during the course of execution of the threads. Kernel provides interface for the thread system to request for an increase in the number of processors or inform the kernel of idle processors.

The paper introduces the notion of a ‘scheduler activation’ which performs the following roles:
- notify the user-level thread system of kernel events
- serves as an execution context for execution of user level threads
- providing space for saving user level thread context in the kernel

Each address space gets a number of virtual processors and the user level thread system gets to schedule a user level thread using a scheduler activation on any of the assigned virtual processors. In case of events like page faults or I/O the user level thread systems is informed of it using an upcall which is raised using one of the virtual processors held by that address space through a scheduler activation. If no such processor is available, one of the user level threads is preempted and it's processor is used. The user level thread system package decides the priority and scheduling of the user level threads.

A final issue solved is the poor performance occurring from spin locks or a deadlock occurring when a user level thread gets preempted in its critical section. This problem is solved by user level thread scheduler by allowing the user level thread to temporarily continue execution till its out of critical section.

Evaluations:

The paper has made a successful attempt of isolating kernel almost out of the threading mechanism thereby improving on the performance. Solutions have provided to all the problems that one might face while moving threading out of the kernel. Still, one questionable point would be if the performance benefit is worth enough for modifying the operating system kernel to support this mechanism.

Confusions:

Didn't quite understand the debugging considerations section. Would like to understand how exactly would the debugging be done.

Summary:
The paper presents scheduler activations, an interface between the kernel and user level thread package which combines the functionality of kernel threads while retaining the performance and flexibility of user threads. Kernel is responsible for processor allocation and notifying user-level of events that affect it. User level interface is responsible for thread scheduling and notifying the events that affect kernel's processor allocation.

Problem:
User level threads are fast and flexibile. Their performance degrades when a blocking call or I/O is performed. Kernel threads are slow and expensive but less restricted. The papaer argues that kernel threads are wrong abstraction to support user level management of parallelism. Managing threads at user level is required to achieve high performance but kernel threads do not support it well.

Contributions:
- Interface for communication between kernel processor and user-level thread system.
- Scheduler Activation: Stores kernel Execution context for user-level threads, notifies user-level threads of kernel changes, provides space in kernel to store processor context.
- provides a virtual processor abstraction for user-level threads.
- Processor reallocation handled using preemption and upcall
- Critical section continued to execute even if context switch is required. A copy of low-level critical section is maintained for recovery.

Evaluation:
Scheduler activation was implemented by modifying Topaz kernel management sytem and FastThreads user-level management. It was observed that cost of user-level threads is almost unchanged with small increase in time on occasions when kernel needs to be notified. Application performance is also similar with modified user-level thread management with minimal and 100% kernel involvement.

Confusion:
What is the overhead of creating and context switching scheduler activations? Would it not affect performance when large number of page faults, I/O occur?

Summary:
This paper describes the implementation of a new user level thread package and its kernel interface, and motivates the need for it by explaining the existing limitations of relying solely on either user level or kernel level threads. Using upcalls and a new abstraction called scheduler activations, applications are able to handle their own scheduling policies while giving the kernel enough information to maintain performance by making appropriate processor allocations.

Problem:
The authors motivate the problem by pointing out the trade offs in using either purely user level or kernel level threads for parallel programs. User level threads while fast and flexible are limited in their correctness and degrade in performance when it comes to kernel events such as I/O and page-faults. Kernel level threads while correct are overly generalized, and inefficient because of trapping costs and extra protection checks. Implementing user level threads on kernel threads has its own correctness and performance problems because there is an information gap which affects the scheduling of the kernel level threads in the case of kernel events like I/O blocking. The performance hit from communicating this information between the two levels would overcome the advantage of using kernel threads in the first place.

Contributions:
By having a set of virtual processors for each address space, the scheduling policies for all threads of that address space are now governed by the application’s own thread scheduler. So for most cases the performance should be equivalent to normal user level threads. Scheduler activations and upcalls allow the kernel to relay information the user level scheduler in an efficient way in the case of kernel events like I/O blocking or when the application needs to be notified of processor deallocation. Since the kernel and the user level runtime system’s interaction is purely in terms of scheduler activations and upcalls, it allows the user program to employ any concurrency model on top of the given mechanism. The kernel on the other hand is given enough information about user threads which are prone to cause kernel events so that it can smartly allocate processors among the applications. Issues of preemption in critical sections is also handled by context switching back to the thread to let it safely exit the critical section.

Evaluation:
The authors implemented their design by modifying kernel management routines of the Topaz OS to implement scheduler activations and the FastThreads user level package to process upcalls and interact with the kernel routines. The performance of purely user level thread operations for FastThreads on scheduler activations was degraded by 3us but it still maintained an order of magnitude difference from Topaz kernel threads. Upcalls added to the cost of the scheduler activations implementation but the authors reasoned that could be tuned and reduced. Application performance is where the scheduler activations outstrip Topaz kernel threads and even perform better than pure user level FastThreads.

Confusions:
Can you please give a picture of how exactly are user level threads are supported by kernel threads in the conventional implementations that the authors were pointing out problems in.

1. Summary
This paper describes a new kernel abstraction and user level threading package for getting efficient performance of parallel programs in a multiprocessor environment. Their approach picks the best out of kernel threads and user level thread support and combines it to form a solution.

2. Problem
The authors argue that the existing threading support at both user and kernel levels is inadequate to obtain the desired performance. User space thread library gives compelling performance and flexibility but suffers severe degradation in the event of interruptions like page faults or I/O. Kernel level support has poor performance as it incurs the cost of switching from user mode to kernel mode for making any decision.

3. Contributions
The proposed solution in the paper is a new abstraction from the kernel and a corresponding new user level threading library. Each application address space is provided with a virtual multiprocessor. The address space has complete knowledge of these virtual processors. Explicit notifications are sent from the kernel to the running address space to inform it of any events and vice versa. The responsibilities are divided among the address space i.e. the process and the kernel. The kernel takes care of allocating virtual processors to each process. Thereafter, each process takes control of its own thread scheduling just like it would have in the absence of kernel threads. The kernel implements scheduler activation which is used to direct control from the kernel to the user level process. This happens on any event in the system that affects the address space like a page fault. The kernel uses an upcall to communicate with the address space. The address space also informs the kernel of any user space events that might affect its scheduling.

4. Evaluation
This paper presents evaluation of the proposed solution by comparing the performance speedup of parallel programs running with traditional techniques and scheduler activations. They measure and ensure that they do not introduce significant delays as compared to kernel threads or user level thread libraries.

5. Confusion
It is not clear how exactly the upcall is handled by the process. Also, the debugging support that they added seems confusing.

1. Summary
This paper describes a new kernel abstraction and user level threading package for getting efficient performance of parallel programs in a multiprocessor environment. Their approach picks the best out of kernel threads and user level thread support and combines it to form a solution.

2. Problem
The authors argue that the existing threading support at both user and kernel levels is inadequate to obtain the desired performance. User space thread library gives compelling performance and flexibility but suffers severe degradation in the event of interruptions like page faults or I/O. Kernel level support has poor performance as it incurs the cost of switching from user mode to kernel mode for making any decision.

3. Contributions
The proposed solution in the paper is a new abstraction from the kernel and a corresponding new user level threading library. Each application address space is provided with a virtual multiprocessor. The address space has complete knowledge of these virtual processors. Explicit notifications are sent from the kernel to the running address space to inform it of any events and vice versa. The responsibilities are divided among the address space i.e. the process and the kernel. The kernel takes care of allocating virtual processors to each process. Thereafter, each process takes control of its own thread scheduling just like it would have in the absence of kernel threads. The kernel implements scheduler activation which is used to direct control from the kernel to the user level process. This happens on any event in the system that affects the address space like a page fault. The kernel uses an upcall to communicate with the address space. The address space also informs the kernel of any user space events that might affect its scheduling.

4. Evaluation
This paper presents evaluation of the proposed solution by comparing the performance speedup of parallel programs running with traditional techniques and scheduler activations. They measure and ensure that they do not introduce significant delays as compared to kernel threads or user level thread libraries.

5. Confusion
It is not clear how exactly the upcall is handled by the process. Also, the debugging support that they added seems confusing.

1. Summary
The current paper argues that the user-level thread management is more performance efficient than kernel threads for parallel programming. To address the limitations of kernel support for such user-level libraries and to provide for better system integration, it introduces a new kernel interface based on scheduler activations.
2. Problem
The paper addresses two primary problems. The first is the inefficiency of kernel threads for parallel thread scheduling and management. The kernel threads remove the integration problems of user-level threads but are heavyweight and inflexible. The second problem is the lack of integration and communication between the user-level threads and the kernel thread interface. With factors like multiprogramming, I/O, page faults, the current kernel interface can lead to poor scheduling decisions, performance or even incorrect behavior.
3. Contributions
The paper introduces a new kernel interface and user-level thread system that combine the functionality of kernel threads with the performance and flexibility of the user-level threads.
The new kernel interface is implemented as scheduler activation objects. These data structures provide an execution context for a user-level thread and act as communication mechanism between the user thread and the kernel. Each processor allocated to a user program is associated with a scheduler activation object. Kernel uses upcalls to notify the user-level of events that it might make a decision for. The kernel context information is passed as arguments to the upcall. The kernel interacts with the user-level only through activation objects, thus providing flexibility to the user-level to implement concurrency model of its choice. The user-level can also notify the kernel when it either requires additional processors or when it can relinquish idle processors.
4. Evaluation
The design is implemented by modifying the Topaz operating system and the user-level thread package, FastThreads. In addition to implementing policies for processor allocation and thread scheduling, different optimizations for critical sections, caching scheduler activation objects are used. Results show that the user-level thread operations are relatively slower than FastThreads due to additional state management. The application performance comparison shows that the proposed scheme provides better speedups than FastThreads and kernel threads, even in multiprogrammed system.
5. Confusion
I don’t understand the significance of the creating and destroying a scheduler activation object on preemption or upcall? The upcall mechanism should be sufficient to communicate with the user-level thread scheduler right?

Summary

This paper addresses the issue of whether user-level or kernel-level threads are more effective for high-performance computing. User threads offer flexibility and avoid kernel traps, but kernel threads offer better possibilities for I/O handling and multiprocessing. The authors argue that user level threads should be built on a kernel interface called scheduler activations: these allow user-level applications to handle most of the scheduling, but kernel upcalls can be used in certain cases to signal events like I/O completion. Scheduler activations provide all the performance of user level threads, along with the advanced control that is only possible in the kernel, and almost no additional overhead.

Problem

Threads can be supported either in user space or at the kernel level, but both approaches have shortcomings. User level threads are fast due to avoiding kernel traps, and they can be explicitly tailored to the needs of any application, but get sidelined by I/O and multi-processor systems. Kernel threads on the other hand have a higher degree of control, but suffer performance problems from kernel traps and generality. One approach has been to build user threads on top of kernel threads, but this does not alleviate any of the problems.

Contributions

The first contribution is providing each user application with a virtual multiprocessor, allowing them complete control over thread scheduling. The other major contribution is scheduler activations, which are able to step in during kernel events (ie, I/O completion) and modify user-level data structures, schedule threads and allocate new virtual processors.

Evaluation

The authors evaluate their system on the DEC SRE Firefly using three different tests: 1) native Topaz kernel threads, 2) Fast-Threads, user level threads running on top of kernel threads, and 3) virtual multiprocessor and scheduler-activated threads. Both Fast-Threads and the new threading model proposed in this paper outperform the native Topaz threads by an order of magnitude. The new threading model is mostly on par with Fast-Threads, but still manages to outperform it consistently by a more narrow margin.

Confusions

Didn't really understand the prevention method used when a user-level thread is blocked or preempted in a critical section. How does there kernel (or even the user-level process for that matter) know that it's in the middle of a critical section?

1. Summary
The paper describes a systems for improving kernel support of user-level thread management. The authors note that user-level managed threads have performance benefits but suffer from issues when interacting with multiple processors or IO devices while kernel threads are slower but do not suffer the same issues. These problems force developers to choose between speed or IO/multiprocessors. The paper suggests a system that aims to provide the same speed as user level thread management when no kernel interaction is needed, but offer that interaction at the same cost as kernel threads giving programmers the best of both kernel and user-level threads.

2. Problem
At the time implementing a threaded application meant choosing either supporting multiprocessors and IO at the cost of performance, or sacrificing the former for latter. Kernel threads offer processor reallocation, but suffer from high latency when threading actions are performed because of the overhead induced by trapping to the kernel. User level threads avoid these overheads but in order to make use of multiple processors they continue to rely on kernel threads by running user level threads on top of kernel threads. In this case the kernel lacks important information held by the user level thread manager that is vital to the scheduling of these threads. It is also insufficient to allow the thread manager to communicate directly with the kernel because the kernel’s decisions rely so heavily on up-to-date thread state, this communication would effectively remove the performance benefits of user level threads.

3. Contributions
Primarily the paper advocates a system that cleanly separates the duties of user-level and kernel-level thread management. The authors’ system give the kernel the job of allocating processors to user-level address spaces and notifying the user-level thread managers when certain events occur like a processor is reclaimed or another is available or an IO request has been processed. The user-level portion has the job of actually scheduling threads to its available processors as decided by the kernel. It is also capable of responding to kernel events in what the authors call a scheduler activation, a recurring mechanism in their system. All interactions between the kernel and user-level thread manager happen in scheduler activations. In short the kernel creates a scheduler activation where it upcalls to the user-level thread manager, notifying it of any event (processor allocation changes, IO completion) that has occurred and then allowing it to run on the processor on which the activation was assigned in order to deal with the event. Additionally the user-level manager is allowed to notify the kernel when it would benefit from more processors or when one of its current processors has become idle.

4. Evaluation
The paper presents numbers comparing FastThreads and Topaz kernel threads to the authors’ hybrid approach. They note that their approach is comparable to FastThreads in most ways and faster in cases of lower available memory.

5. Confusion
I’m unsure how they can protect against malicious applications from allocating more processors than it needs or is “fair”. The policy they describe sounds somewhat similar to the idle memory tax, but in this case the kernel has no way to detect if a processor is idle. They instead say over time it should be penalized more for having more processors. Is this not unfair to long running highly parallel programs? When an address space has a processor removed from it, what can the thread manager do to notify the kernel that it was actually using it? By requesting another processor it could be penalized, and by releasing one it’s punishing itself.

Summary:
The paper discusses a new kernel interface and user-level thread package that aims to provide the best of kernel threads and user-level threads. The paper argues that kernel threads are the wrong abstraction to support user level management of threads and proposes a mechanism, which provides each application, particularly user-level thread scheduler with an abstraction of a virtual multiprocessor. In the new mechanism, the kernel allocates processors to address spaces, but thread scheduling is done within each address space. To effectively utilize the processor, communication is needed between the kernel and the user-level mechanisms. Scheduler activations facilitate this two-way communication. A scheduler activation vectors control from the kernel to the address space thread scheduler on a kernel event; the thread scheduler can use the activation to modify user-level thread data structures, to execute user-level threads and make requests of the kernel.

Problem:
User level threads and kernel threads are the two approaches to parallel programming. User-level threads have excellent performance, since they do not need kernel intervention and are also flexible and can be customized to fit the needs of a language or user without modifying the kernel. However, they can have poor performance or incorrect behavior in a multiprogrammed environment. Kernel threads avoid many of the problems user-level threads have because the kernel directly schedules each application's threads onto physical processors. But, this comes at the cost of performance in kernel threads. Thus, a parallel programmer either has to choose user threads or kernel threads to provide concurrency and has to make a tradeoff between performance and functionality. This paper tries to solve this predicament by employing a mechanism that offers the best of both worlds.

Contributions:
The approach employed by this paper to tackle the problem is based on providing each address space with a virtual multiprocessor , in which the user-level thread scheduler is aware of number of processors it has been allotted and also knows which of its threads are running on those processors. responsibilities are split between the kernel and each application address space. Kernel allocates the processors to the address spaces, while each address space is held accountable for its thread scheduling decisions. The kernel notifies the address space thread scheduler of events affecting the address space, such as adding a processor, preempting a processor etc. via a scheduler activation.The user level notifies the kernel only on those subset of user-level thread operations that might affect processor allocation decisions. As a result, performance is not compromised and not all threads suffer the overhead of communication with the kernel. The scheduler activation serves as an execution context for running user level threads and provides data structures very akin to those provided by kernel threads. It also serves as a means of effective communication to vector information between kernel and address space thread scheduler. The main advantage of this hybrid approach is that the user level thread system manages its virtual multiprocessor transparently to the parallel programmer.

Evaluation:
The authors evaluate the performance of fast threads on scheduler activation against Fast threads on topaz threads (user level thread approach) and topaz threads (kernel thread approach). The authors observe that, under minimal use of kernel services, fast threads on scheduler activations run as fast as fast threads on topaz threads and much faster that topaz threads. They also observe that fast threads on scheduler activations performs better than original Fast threads and Topaz threads when application requires significant kernel involvement. The authors also provide evaluation of upcall performance in their approach and posit that their upcall performance if tuned would be commensurate with Topaz kernel thread performance.

Confusions:
My confusion is regarding the support for critical section provided by the scheduler activation approach. It relies on a flag mechanism to indicate if a thread is in critical section. Is this flag shared between threads? In that case, should it not be made thread safe, which is the very problem we are trying to tackle?

1. summary
In the paper, "Scheduler Activations: Effective Kernel Support for the User Level Management of Parallelism", the authors argue that performance of kernel threads is worse than user level threads and integrating user level threads with applications is difficult only due to the lack of kernel support for user threads. Therefore, they propose and implement a new kernel interface and user level thread package that provides parallelism without compromising on performance and flexibility.

2. Problem
Since user level threads are executed as normal function calls, they are extremely fast and flexible. On the other hand, kernel threads result in system calls, which reduces the performance due to context switches and trap to kernel. For the kernel to schedule user threads, it needs access to user level scheduling information or user level thread state. Similarly, the user level thread scheduler must be aware of IO handling and page fault handling to schedule application threads. Due to this information gap between the kernel and application address space, user level threads perform badly and are not being used in spite of its flexibility.

3. Contributions
- separation of mechanism and policy in thread management. While the kernel is responsible for the mechanism of allocating processors to the user level processes using space sharing, the user level thread scheduler takes care of scheduling policy, which is completely transparent to the kernel
- communication between the kernel process allocator and user level thread scheduler by using scheduler activations
- scheduler activations serves as execution context for running user level threads, notifies user level thread scheduler of kernel events and provides space in the kernel to save the context of the current user thread
- address space notifies the kernel whenever it makes a transition to a state where it has more runnable threads than processors or more processors than runnable threads; penalizing processes using more processors so that processes are encouraged to give up processors when some other process requires it

4. Evaluation
The authors have evaluated the performance of the FastThreads on Topaz threads versus FastThreads on Scheduler Activations for Null Fork and Signal-Wait calls. They observe a degradation by 3 - 5 microseconds in their approach. Similarly, to approximate the overhead added by scheduler activation of making and completing an IO request or a page fault, they measured the time for two user level threads to wait and signal through the kernel. The computed time is worse than that for normal Topaz threads by a factor of 5 which they attribute to their implementation being in Modula2+.

5. Confusion
- I did not understand the prevention approach to deal with the situation when a user level thread executing in a critical section is pre-empted or blocked.
- I am not very clear with the performance enhancement to determine if a thread which got pre-empted or blocked was in the middle of executing its critical section, using assembly code.

Summary
The paper discusses the limitations of the two popular approaches used to exploit concurrency in programs, kernel threads and user-level threads. The authors contend that user-level threads are inherently better-suited to facilitate high-performance parallel computing. Towards this end, they propose a novel scheme involving a new kernel interface called scheduler activations, to improve the performance of user-level threads.

Problem
Kernel threads are better than user-level threads at handling I/O or blocking system calls, as the kernel can schedule another thread on an idle processor. However, kernel threads are slow due to overheads like trapping to the kernel on thread switching. Also, the kernel thread scheduler lacks knowledge about the behavior of the application, and as such tends to perform worse than a user-level thread scheduler. On the other hand, user-level threads guarantees good performance only in a uniprogrammed environment, and in the absence of I/O. Thus, both of the available techniques of exploiting concurrency to boost performance are limited in their benefits.

Contributions
The major contribution of this paper seems to be the judicious separation of responsibilities between the kernel and the user-level runtime, allowing each to leverage its available knowledge. The kernel handles processor allocation between various address spaces, while the user-level runtime takes charge of scheduling threads on the available processors. This falls in line with the exokernel philosophy as a very low-level abstraction is offered to the applications. Scheduler activations in conjunction with upcalls is a neat way of exposing kernel-level information to the user-level runtime, and allowing the user-level thread scheduler to make policy decisions. This also allows them to solve the problem of starvation caused by threads blocked on I/O or system calls. This approach avoids the preemption of a thread holding a lock by checking if preempted thread was executing in a critical section, and allowing it to make progress while returning to the original up-call at the end.

Evaluation
The authors demonstrate the strength of their proposed design by modifying the Topaz OS as well as the FastThreads user-level thread package suitably. For workloads involving minimal kernel intervention, the proposed implementation performs as well as the unmodified FastThreads, in terms of performance and scalability with increasing number of processors. However, under memory pressure and thus increasing paging-related I/O, FastThreads with scheduler activations shows a 1.5x speedup as compared to the native FastThreads.

Confusions
I am confused about a very basic issue. How does the user-level thread scheduler preempt threads, or switch between threads? Or does it only modify the runnable queue suitably? If it only modifies the runnable queue, why doesn’t the default user-level thread scheme work when a thread makes blocking I/O calls? Surely, there are other threads from the same application on the runnable queue which could be scheduled in place of it, although under competition for processors with threads from other applications?

Summary
This paper describes a mechanism for replacing the kernel thread abstraction to support parallelism at the user level. The authors implemented a system that provides the performance of user level threads while maintaining the functionality of kernel threads. To each process, a virtual multiprocessor abstraction is provided and the kernel allocates the number of processors to each process. Using this abstraction, threads can be scheduled by the user level programs, processors can be allocated/deallocated to processes, and the kernel can communicate to the thread scheduler using scheduler activations alerting the user scheduler of changes in the kernel.
Problem
Threads can implemented two ways, either in the user level or the kernel level and each implementation has advantages and disadvantages. User level threads require no intervention by the kernel and are flexible. They can be executed in the context of the user process and are scheduled the same way a process is. However, user processes can exhibit poor performance when the abstraction of a virtual processor breaks down, which is when I/O or page faults occur. Kernel threads are often inefficient because they are heavyweight and involve kernel intervention.
Contributions
To achieve the performance of user level threads without the disadvantages, the abstraction of a virtual multiprocessor is provided to each process. The kernel then notifies the user level scheduler of changes in the kernel that affect the process, such as preempting a thread. The mechanism used to achieve these notifications is called scheduler activations. Scheduler activations can do multiple things, including acting as an execution context for user level threads, notifying the user level scheduler of a kernel event (like preempting a processor), and provides space for saving the processor’s context when a thread is stopped. The distinction of scheduler activations from kernel threads is that scheduler activations allow the kernel to alert the user level scheduler when an event has happened so the scheduler can respond accordingly. The user level program can also alert the kernel about user level events such as requesting another processor or telling the kernel a processor is idle. With this new threading implementation, special care is also taken when handling critical sections to recover from potential latency incurred by waiting on a lock. When preempting a thread in a critical section, the thread is temporarily continued, then returns back to the upcall when the critical section is finished.
Evaluation
Overall, the performance of this implementation was very good compared to existing implementations. With scheduler activations, the Null Fork and Signal-Wait tests were still an order of magnitude better than kernel threads, and only slightly worse than user threads because of the additional check of executing in a critical section. The speedup of the N-body problem increases linearly with the number of processors. However, the upcall performance of the implementation was rather poor, being a factor of 5 times worse than kernel threads. They attribute this to implementation issues that could possibly be eliminated if implemented from scratch.
Confusions
I don’t understand how the system handles making a procedure call within a critical section. Does the optimization of creating a copy of the assembly code with a jump instruction back into the scheduler only work if there are no procedure calls? The paper mentions it could support this using a flag, but it could do this in the general case as well. If procedure calls in critical sections are common, is this optimization actually significantly better?

1. Summary
This work tries to provide the right kernel-level abstraction for supporting user-level management of parallelism. Using the mechanism of ‘scheduler-activations’, the kernel provides the illusion of a virtual multiprocessor to user-thread system. Kernel will vector control information to user-space whenever kernel wants to take some action pertaining to the thread. The user-space can use this control to effectively manage parallelism. Evaluations suggest that scheduler activations provide the performance and flexibility of user-threads without their concomitant functionality traps.

2. Problem
There are two ways in which parallelism can be managed - user-threads and kernel-threads. User-threads are very fast and flexible, but are not well integrated into the system and could have functionality issues in multiprogrammed environment, as kernel decisions are not communicated to the user. Kernel-threads are well integrated into the system, but are not fast or flexible. We seek a mechanism which provides the performance and flexibility of user-threads with the functionality support of kernel threads.

3. Contributions
The virtual multiprocessor abstraction means that user-scheduler knows which processors are assigned to it and can independently schedule jobs on them. Whenever kernel takes decisions affecting this virtual multiprocessor, it will communicate this to user-scheduler using a scheduler-activation. The scheduler activation serves as a vessel for the execution context of a user-thread. Despite being similar to kernel-threads, the kernel will never block/resume scheduler activations without notifying user. The kernel honors the invariant that the # of running scheduler activations will always equal the # of processors assigned to the user. The user-thread system can also communicate with the kernel on a subset of events that may affect the processor allocation policy. Each new communication from kernel->user results in a new scheduler activation being created. Old, discarded ones are garbage collected and recycled. To avoid preempting a user-thread holding a lock, the kernel will temporarily pass control to user-scheduler which can then run the user-thread till it exits the critical section. A big advantage of the scheduler activation is that the vehicle of concurrency in user-space can be anything - threads/tasks etc. It does not affect the kernel implementation of activations.

4. Evaluation
They modified Topaz OS's kernel thread system and FastThreads, a user-level thread package. Thread operations (null fork, signal wait) indicate that scheduler activations performance is close to user-threads. Without too much I/O overhead, scheduler activations outperforms kernel and user-threads. With I/O overheads. it is still better, but it can be much better than this on a cleaner implementation.

5. Confusions
1. The debugging support section is not clear. What is different about thread-system debugging vs application debugging?

Post a comment