CS 736 Reviews - Spring 2017: Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism

« U-Net: A User-Level Network Interface for Parallel and Distributed Computing | Main | The scalable commutativity rule: designing scalable software for multicore processors »

Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism

Thomas Anderson, Brian Bershad, Edward Lazowska, and Henry Levy. Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism. ACM Trans. on Computer Systems 10(1), Feburary 1992, pp. 53-79.

Reviews due Thursday 3/2

Posted by Michael Swift on March 1, 2017 06:42 PM | Permalink

Comments

Summary:
The paper initially gives a comparison of kernel-level and user-level thread schemes and the tradeoffs associated with them. It then introduces a new technique for the design and implementation of thread interface which includes the user-level thread package and a new kernel interface. This new design supports the flexibility of user-level threads and provides the performance of a kernel threads.

Problem:
User-level threads are flexible and provides good performance. However, whenever there are page faults and I/O, the performance of user-level threads takes a hit. Kernel threads on the other hand, can efficiently provide support for multiple threads per address space and can also overcome the integration problems that level threads suffer from but they are an order of magnitude slower due to heavy kernel involvement. The problem that the authors are trying to address is how to provide the functionality of kernel-level threads with the performance and flexibility of user level threads.

Contribution:

The authors propose a design for user level threads where each process is provided with its own virtual multiprocessor, an abstraction of physical processor by the Operating system kernel. Each application controls which of its threads are running and which processor has been allocated and the kernel will have complete control over the allocation of processors among address spaces. The authors propose a new design, scheduler activations where the application has knowledge of its scheduling state.The Scheduler activation vectors control from the kernel to the address space thread scheduler on a kernel event. The thread scheduler can use the activation to modify user-level thread data structures and provide information of blocking of the thread by performing an upcall into application address space.

Evaluation:
The authors evaluate their design by modifying the FastThreads user-level thread library and Topaz kernel thread management routines.
In their evaluation they consider the cost of user-level thread applications like fork and signal-wait. Its observed that upcall performance for their implementation is much slower than Topaz's kernel threads. This could be due to more state maintained by the scheduler activations. They also evaluate their system with different number of processors and varying the percentage of available memory. They find that their implementation was faster when the available memory percentage was lower.

Confusion
1. The handling of pre-emption when running in critical section was not clear particularly the binary translation method.
2. Is scheduler activations supported by any current OS?

Posted by: Lokananda Dhage Munisamappa | March 2, 2017 08:33 AM

1. Summary
The paper presents an effective way of doing processor scheduling in a hybrid manner, a combined user-level library handling usage, schedule and events with the kernel handling allocation and informing the userspace library of events relevant to the current application. The work presents a simplified mechanism, scheduler activations, to allow for this approach. The paper motivates the requirement of such a hybrid system by evaluating both pure user and kernel threads for advantages and disadvantages. They also present a clean design for the mechanism independent of the implementation presented.
2. Problem
Scheduler activations address the following two problems for pure user-level threads packages: they allow a process to run multiple threads in parallel on a multiprocessor (which a current pure user-level thread package cannot do without kernel support), and they allow a process to hide I/O latency by running other threads when one thread blocks for I/O (Current packages with kernel support will block the kernel thread/ all user-level threads associated if one thread blocks for I/O).
3. Contributions
The major contribution of this work is showing that collaborative (user-level handling some aspects of the kernel, with kernel support) mechanism, Scheduler activations, can work and lead to significant gains in performance. They also formed the path for future efforts in development of a philosophy of application specific operating systems, Exo-kernels, where user-space was more involved in traditionally only kernel functions such as memory management, I/O handling, scheduling.
4. Evaluation
The authors provide a detailed analysis of their system with the use of micro-benchmarks for measuring overheads and a single N-body simulation for judging performance. Even though the evaluation seems to be thorough a bigger mix of applications might have helped their case and provided a better judgment for the readers of then true costs of the mechanism.
5. Confusion
The evaluation of the scheduler activations mention a five times worse performance for some benchmarks in certain conditions(such as upcalls for certain events such as page-faults), if this is the case was it due to the infrastructure they used or was it an inherent issue with moving thread management to procedures rather than system calls?

Posted by: Akhil Guliani | March 2, 2017 08:19 AM

1. Summary
This paper describes the idea of scheduler activations, which allows user-level threads to send information that can help scheduling to the kernel. Scheduler activations allow a thread package to achieve similar performance as a user-level thread system, combined with the processor information available to kernel threads.

2. Problem
Typically, an application can use a user-level thread package, threads provided by the kernel, or both. A user-level thread package provides high performance because it runs within the same process as the program, and therefore it requires no traps into the kernel. However, user-level threads lack information about which processor in a multiprocessor is running the program, and they block on a uniprocessor when performing IO. Kernel threads have this information and can make scheduling decisions that take process load and IO into account. However, kernel threads provide lower performance than user-level threads for the reasons previously mentioned. User threads need to have more information about kernel scheduling decisions.

3. Contributions
The authors add scheduler activations to FastThreads, a user-level thread package, to the Topaz operating system on the DEC SRC Firefly multiprocessor workstations. Scheduler activations give the user application the illusion of a virtual multiprocessor. The kernel performs an upcall, sending a scheduler activation to the user thread, to indicate that a virtual processor has been added or preempted, or if a scheduler activation has been blocked or unblocked. Similarly, the user-level threads can communicate to the kernel to ask for more processors or fewer processors.

4. Evaluation
The paper includes several microbenchmarks that show that thread operations run slightly slower when scheduler activations are added to FastThreads, but still much faster than on Topaz kernel threads alone. A macrobenchmark running the N-body problem shows that scheduler activations provide some speedup and slightly lower execution time compared to the original FastThreads package, even as the number of processors increases. An issue in the evaluation was that the speedup of FastThreads with scheduler activations seems to approach the speedup of the original FastThreads at 6 processors, and it is unclear if the two curves would cross if the number of processors is even higher. Another issue is that upcall performance runs extremely slowly, in some cases five times slower than kernel thread operations.

5. Confusion
It was unclear why the system needs to make a copy of every critical section at compile time. What happens when a thread is preempted during a critical section if the critical section is copied?

Posted by: Varun Naik | March 2, 2017 08:08 AM

Summary
This paper provides the design, development and evaluation of an user level thread system and the required kernel support to achieve effective use level management of concurrency on a multi-processor system. This system allows the best of both worlds, functionality of kernel threads with the performance and flexibility of user level threads.

Problem
The paper tries to solve the dilemma of choosing support for threads by the operating system kernel or by user level library code in application address space. As both the ways have their pros and cons, user level threads are effective in managing parallelism in user level application due to their higher performance and flexibility. However, without kernel support it is very difficult to integrate it well with the entire system (Ex. The whole process is blocked if one of the threads is dealing with I/O). User level threads built on top of kernel level threads also suffer from similar limitations. On the other hand, kernel threads also have their limitations, for ex. they are very slow and inefficient when compared to user level threads as kernel must manage and schedule threads as well as processes. It require a full thread control block (TCB) for each thread to maintain information about threads. As a result there is significant overhead and increased in kernel complexity.

Contributions
The approach suggested by the authors is that the kernel provides each system with its own virtual multi-processor and an user-level thread manager which is responsible for interacting with the kernel and making user thread scheduling decisions. Kernel is responsible for allocating processors to address spaces and for notifying the user-level thread system of a process regarding the processors allocated/deallocated to its address space.
The mechanism introduced to achieve such communication is called scheduler activations. Upcalls made by the kernel to the scheduler activation allow it to communicate key events to the user-level thread system. When a user-level thread blocks in the kernel or is preempted, no expensive data copying from the kernel is required to restore the the user thread state as this data is stored in thread control block at he user level. Preemption of a scheduler activation requires creation of a new scheduler activation to update the corresponding user-level thread system about its preemption. Processes use notifications to request more processors from the kernel, misuse of this feature is managed by introducing a multi-level feedback queue that penalises processes those use a higher number of processors.
The critical section problems is addressed by adopting a solution based on recovery. The idea is to run modified copy of a critical section to make the thread yield on completing the critical section. Another optimisation mentioned in paper is of reusing scheduler activations to avoid the overheads to creating new activations.

Evaluations
The suggested design is implemented by modifying the FastThreads user-level thread library and Topaz kernel thread management routines. The implementation is compared on the Topaz OS running on Firefly machine against kernel level Topaz threads and the original FastThreads. They evaluated the performance of the null fork and signal wait calls which shows that around 10% perf degradation for Null-fork and 15% degradation for Signal-Wait. The upcall overhead was indirectly measured through a Signal-Wait test with kernel-level synchronization, and the implementation performs significantly worse in this case. In the other case of negligible I/O in N-body simulation Fast-threads and scheduler activations outperformed both Fast-threads and kernel-threads. This evaluation show that using user level threads with explicit kernel support works best in cases such as N-body problems where there is support for running parallel, inherently scalable workloads.

Confusions
What are the real time implementations of this system?
Handling of page fault for user level thread manger is not very clear.

Posted by: Om Jadhav | March 2, 2017 07:56 AM

1. summary
The paper provided a kernel interface and a user-level thread packet that makes it possible to achieve the performance of user-level thread while incorporating kernel-level thread to fix the functionality problem of user-level thread when it is truly in need.

2. Problem
User-level thread has good performance but cannot handle I/O and other things that need the kernel like page-fault. Kernel thread, however, has poor performance. To fix the gap, a mechanism might coordinate with kernel for trap to support scheduling policies. The problem is that, the information for this goal is distributed between the kernel and each application's address space.

3. Contributions
a). Provide a good design for N:M threading model.
b). Secheduler activation mechhanism to provideds the application with virtual multiprocessor, notify the application the kernel event which could influence the address space and provide space in the kernel to save context in case that running user-level thread is stopped by kernel. The user-level thread then can communicate with kernel about the processors to allocate and make user-level decision.
c). Using several upcalls for kernel-user communication, when the # of processor changes (like adding processor, processor preempted), scheduler activation is blocked or unblocked.
d). Support user-level priority scheduling by preempted processor.
e). Dealing with critical section problem through recovery.

4. Evaluation
The paper measured the performance of user-level thread operations like fork, singal-wait in their system, which is comparable to that the user-level thread can offer. They also evaluated the performance of upcall and their result is about 5x slower than kernel thread. Althrough the author explained that as a implementation optimization problem, I believe it needs further exploration. Finally, application performance is evaluation through an algorithm problem which can be both cpu or memory bounded. But just one application was included in evaluation, and I wonder what is the condition under some more general workload.

5. Confusion
About the critical section detection, why flag will be expensive ?
Why in fig2, it outperform user-level thread?

Posted by: Jing Liu | March 2, 2017 04:59 AM

Summary:
Implementing threads at kernel level or user level have their own advantages and disadvantages. This paper takes best of both worlds and presents a new design and implementation of kernel interface and user level threads which is as feature rich as kernel level threads and provides performance and flexibility similar to user level threads. It achieves this by providing virtual multiprocessor to each user level thread and using scheduler activations for communication with kernel to efficiently manage resources such as asking for additional processors or informing about idle ones.

Problem:
Kernel level threading mechanism lacks performance and flexibility of user level threads. It has significant overheads such as authenticating thread at every switch and is rigid in terms of providing policies for application threads. User level threads on the other hand are inefficient in case of high I/O or page faults and even worse, exhibit erroneous behavior sometimes in such cases. These problems motivated the author to design a thread package which is error free, flexible and has performance as of user level threads.

Contributions:
1. Discusses merits and demerits of user level thread package and kernel level thread support.
2. Introduces a hybrid approach – while thread support stays at user level, kernel provides interfaces to coordinate with user process such that kernel has sufficient information to provide environment for correct thread execution.
3. Each process has its own virtual multiprocessor. User level thread package is in control to determine policy for running threads on these processors. It can apply its custom optimization technique. Kernel stays in charge of allocating processors to address spaces.
4. Scheduler activations facilitates communication between kernel and user thread and provide execution context by storing user thread details while switching. Kernel notifies processes of blocked/unblocked events of threads.
5. Separation of policy and mechanism – Kernel allocates processors but remains oblivious of scheduling policy or on top concurrency model of user level thread.
6. Use recovery model – user thread continues to execute critical section temporarily even if it has been notified by upcall about preemption or block.
7. Discarded scheduler activations are cached and reused to create new activations and this reducing creation overhead.

Evaluation:
The design is evaluated for 1) Null fork, and 2) Signal wait benchmarks and performance is compared with Topaz threads and FastThreads. The hybrid approach performs significantly better than kernel threads and performs (with a little degradation in performance) similar to FastThreads. Authors attribute this minor degradation in performance to overheads such as critical section check. Upcalls are notably slower. However, authors attribute it to implementation rather than design. FastThreads with scheduler activations outperform its FastThread counterpart and kernel threads both.

Confusion:
Is there a timeout for critical section? Can a fraudulent thread pretend to always be in critical section and be running always even if it receives preemption / block notice?

Posted by: Rahul Singh | March 2, 2017 04:15 AM

1. Summary
Parallelism in programming languages is achieved via Threads. Applications spawn Threads to run multiple tasks concurrently. The Threads are implemented either in Kernel or in user level library. Each implementation has its advantages. The paper proposes a new hybrid implementation of threads.
2. Problem
The paper claims that Kernel threads bear high overhead of running tasks concurrently. As Kernel threads are minimal processes, spawning a thread bears system call execution overhead. Also, the applications cannot modify the scheduling policies of its own threads. Kernel Threads are managed by OS Scheduler. As a result, some applications might favour user level threading library. But, user level libraries have their own issues like correctness guarantee, unfair resource allocations among threads, etc. The paper claims that Threads need support from Kernel. But, the Kernel Threads interface is not the best solution.
3. Contributions
The paper proposes new Threading model based on Kernel Activations. This threading model is extension of user level library with some support from Kernel. The kernel should pass events to user library via interface called Activation. The library governs the scheduling policy for the threads. The kernel also allocates certain number of processors to the process as per availability. The number of processors is controlled by Kernel and can be modified at any point of time. Depending upon the number of allotted processors, information from kernel (Blocked/ready/running threads) library decides the next thread to be run. Since more than one events (Upcalls) can be occur, events are batched together and passed on the Threading library. This threading model thus enjoys performance and flexibility of User level libraries and protection, resource fairness from Kernel.
4. Evaluation
The new threading model was implemented in Topaz OS. The Topaz kernel exposed events via Activation interface. The evaluation uses FastThreads package to implement user level thread scheduling policies. The modified FastThreads with Scheduler activation clears beats FastThreads on Topaz kernel threads and original FastThreads in Thread execution performance. The paper also contains the Speedup gained by evaluation application by the use of Activation. The paper states that their Upcall implementation was slower than Topaz Kernel’s thread operations. Authors demand kernel redesign from scratch to improve the performance.
5. Confusion
Inspite of having high overhead in Upcall implementation, how did authors get better application performance?

Posted by: Rohit Damkondwar | March 2, 2017 04:10 AM

Summary:
While the user level threads can offer high performance and flexibility, they exhibit poor performance or incorrect behavior during multiprogramming, I/O and page fault which are better handled by kernel threads. This paper presents model for thread management which takes best of both the worlds by providing a kernel interface and user-level thread package that together achieve performance, flexibility and functionality.
Problem:
Parallel programmers are often in a dilemma between choosing user-level threads and kernel threads. In a unprogrammed system without I/O, user-threads can offer very high performance by avoiding the overhead of kernel intervention. It also provides the flexibility of different parallel programming models. But unfortunately, user-level threads lack kernel support due to which they can perform extremely poor in a multiprocessor and multi programming system. Kernel threads, although provide better functionality, are too heavyweight.
Contribution:
> The main idea of this paper is to provide a kernel interface and a user-level thread package which together provide all the benefits of both kernel and user threads.
> In this model, kernel provides a virtual multiprocessor to each user program and has complete control over the processor allocation.
> The thread management system decides which thread is scheduler on which of its allotted processors.
> Scheduler activations is the execution context on top of which user thread can run. It is also used to notify the user about processor relocation.
> user-level thread system notifies the kernel about operations affecting processor allocation, requests for additional processors etc
> user level thread has the flexibility of using any parallel programming model.
> gaming the system is prevented by using multilevel feedback scheduling.
> Employes correct recovery in case the processor is preempted during critical section, by allowing thread to continue temporarily via user-level context switch.
Evaluation:
This design is implemented on DEC SRC Firefly multiprocessor workstation. Fast-threads is modified to implement user-level thread system and kernel threads of topaz are modified to implement scheduler activations. Performance of same parallel application running using topaz kernel thread, fast thread user thread and the modified design was compared where the latter achieved significated speedup.
Confusion:
If the processor is preempted during execution of a critical section, how will the user-level system continue to run the thread if no other processor is allocated to its address space?

Posted by: Pallavi Maheshwara Kakunje | March 2, 2017 04:07 AM

Summary
This paper talks about providing efficient user level thread scheduling by combining the functionality of kernel threads with the performance and flexibility of user-level threads. Scheduler activations are used to perform this.

Problem
Parallelism can be provided either by using user level threads or kernel level threads. While user level threads give excellent performance in case there is no multiprogramming, I/O and page faults but can perform badly in the presence of these factors. User level threads are generally more flexible as compared to kernel level threads. On the other hand, kernel level threads are easier to use and avoid system integration problems. But they are similar to traditional UNIX processes and are too heavyweight for use in many parallel programs. Thus there is always a choice that the programmer has to make between user-level threads and kernel level threads. This paper tries to answer this by combining the functionality of kernel level threads with the flexibility and performance of the user level thread

Contribution
In this design, each application is provided with a virtual multiprocessor which is completely controlled by the application. Kernel, on the other hand, controls the allocation of physical processors. Scheduler activations are used by the kernel to notify application about events affecting the scheduling state of the application like allocating processors and taking back previously allocated processors. Based on the events received from the kernel, the application can make its scheduling decisions, modify its data structures and can make requests to the kernel. This paper talks about the issue that could arise if kernel decides to preempt one or more processors of an application when it is executing in its critical section. There are two ways to tackle this problem. Prevention avoids this problem from arising by taking over the control of the processor and not allowing the kernel to preempt it. This is a serious drawback as the application can take the control of the processor for as long as it needs. Second solution is recovery which allows the user thread to run through a user-level context switch until it gets out of its critical section.

Evaluation
Authors implemented their design by implementing it using Topaz kernel threads, original FastThreads and modifies FastThreads. Thread performance was similar to original FastThreads and much faster than the Topaz thread if the kernel intervention is minimal. In cases where kernel is involved, system performed better than both original FastThreads and Topaz threads. Similarly, system performed better in cases where application induced kernel events were present.

Confusion
Could you please explain the handling of preemption when the thread is in critical section in more detail.

Posted by: Gaurav Mishra | March 2, 2017 03:52 AM

Summary

The paper describes the implementation of a new concurrency model the authors have come up with. In this model major share of thread synchronization and creation is done at the user level with the kernel only minimally interfering for events like thread preemption, I/O or page fault.

Problem

User level threads are good for performance but lack the functionality of kernel threads. For example if some one of the threads in a process is blocked for I/O the entire process is blocked by the kernel thus making it impossible to schedule any of the other threads in the process.

On the other hand kernel threads are slow, owing mainly to the number application-kernel crossings.

The authors clearly stated at the their main goal is to threads perform as well as any other procedure calls and at the same time have all the functionality provided by pure kernel threads.

Doing this is challenging because the necessary control and scheduling information is distributed between the kernel and each application’s address space. For example events I/O completion are notified to the kernel and information like the amount of parallelism required for a process is only with the application.

Contributions

The authors created a mechanism called scheduler activation which is a container of user level threads and also used by the kernel to upcall into a specific application about a kernel event or to save a thread’s state when it is descheduled.

The authors also came up with policies for:

Processor Allocation: The main idea is that processors are space shared while respecting priorities and time sharing is used only when the number of processors is not an integer multiple of the number of address spaces(processes).

Thread Scheduling policy: A major advantage with using user level threads is that each application can use it’s own policy for scheduling. The default policy is to use a per-processor ready list which is processed in LIFO order to improve cache locality.

Evaluation

The authors did a good job of evaluating the new system under various conditions. They used Topaz operating system with FastThreads user level thread library as the base system.

Latencies for Null-Fork and Signal Wait operations are compared with pure kernel thread and FastThreads on Topaz Threads latencies which are close to FastThread latencies.

I really like the way Upcall performance is measured. The authors forced the signal wait operation to go through the kernel(instead of of usual way of just user level). Since we already know that for thread performance this is close to pure user level performance anything extra can be attributed to the upcalls. It is found that this is much worse (factor of 5) compared to pure kernel thread. This is due to the extra state maintained by scheduler activations.

Questions

1. What are memory implications of using a whole new code copy for every thread. Is it an issue for large applications?

2. Is figure 3 contradicting the argument in 5.2 where it said that upcall performance is much worse compared to pure kernel threads?

3. Can you please discuss the implementation of user level threads on top of kernel threads briefly?

Posted by: Mayur Cherukuri | March 2, 2017 03:41 AM

1. Summary
Threads implement one of the most important piece of an Operating System – Concurrency. Neither the threads supported by kernel nor the threads provided by user level library are fully satisfactory implementations. The paper proposes a design and implementation of new kernel interface and user level thread package which promise both the functionality of kernel threads and high performance of user level threads.

2. Problem
The two implementations of threads have their own set of problems. While kernel threads provide high functionality, they lack in performance and flexibility because of rigid scheduler and information exchange overhead. Though user level threads achieve better performance and provide a higher degree of flexibility, they lack the high level functionality in the case of traps and system calls. So the user willing to achieve parallelism is faced with the dilemma of choosing performance of user threads or power of kernel threads.

3. Contributions
The authors consider an approach where each application is provided with a virtual multiprocessor which is an abstraction of a dedicated physical machine. The paper considers a symbiotic approach of kernel and application to achieve a better implementation of threads. Kernel notifies the thread when number of allocated processes changes or when a user level thread blocks or wakes up.
The paper proposes the idea of scheduler activations. On a kernel event a scheduler activation passes control from the kernel to the address space thread scheduler, so that the thread can use this activation to modify user-level thread data structures, to execute user-level threads, and to make requests of the kernel. This application customizable scheduling results in a higher degree of flexibility without compromising the kernel level functionality.
The kernel is notified of any user-level events that might affect processor allocation, which is just a small subset of thread activity. Whenever there is a transition to a state of the address space, where it has more runnable threads than processors or vice versa. This helps guarantee that there is no idle processor in the presence of runnable threads. Also in the face of pre-emption during critical section execution, it follows a solution based on recovery which is completely free from deadlocks.
The paper also presents an implementation by modifying Topaz and developing a user-level thread package called FastThreads. The paper details the processor-allocation policy, thread allocation policy and the performance enhancements deigned in this implementation.

4. Evaluation
The authors start with the simple evaluation of thread operation latencies on the execution of Null Fork and Signal Wait. After presenting their implementation, they evaluate the latencies for the same calls again, but with FastThreads without kernel involvement and on scheduler activation.
The authors consider N-Body problem to measure the application performance of their implementation. They provide a comparison between speedup of Topaz threads, original fast threads and new fast threads when run against N-Body application. Also they provide an analysis of execution time as a function of free memory available for N- Body application.

5. Confusion
The handling of the blocking or pre-emption of a thread executing in the critical section is not clear. Can you please explain this?

Posted by: Sharath Hiremath | March 2, 2017 03:25 AM

Summary
This paper introduces Scheduler Activation, a new technique that supports flexibility of user-level threads along with the performance of a kernel thread without loosing the flexibility by providing kernel interfaces and user-level thread package.

Problem
Authors claim that two approaches of supporting threads, user level or in kernel, are are fully satisfactory. User level threads are flexible: can be customized based on application needs, and they are fast: thread operations aren't costly. But the problem is that user level threads are prone to page faults and I/Os. One threads doing an I/O can pull the entire process out of the processor. This can be solved by using kernel threads. Kernel threads can avoid the integration problem of application threads by scheduling each thread onto a physical processor. But kernel threads are an order of magnitude slower than user level threads. Thus neither of the solutions provide both flexibility and performance.

Contribution
The authors put forth the idea of having a user-level thread management library with a kernel interface where, for each process, a virtual multiprocessor is provided. Each application has control over which threads are run on which of these virtual multiprocessors. Kernel would have the control over processor allocation among processes.
Kernel provides processes, the information regarding the events that affect the allocation of processor to the process (like blocking I/O or waking up from blocked state). The application thread scheduler can then decide which of its other threads can run instead. This helps in avoiding complete requisition of processor. This is achieved by mechanism called scheduler activations. Now kernel, rather than interpreting the events on its own, it delegates this responsibility to application thread scheduler by vectoring the events.
User level threads system only notifies the kernel when necessary (subset of user-level thread operations that might affect processor allocation decisions) thus reducing interaction with the kernel. Kernel uses the information provided by the processes to handle scheduling decisions.
The core distinction between kernel threads and scheduler activation is that, in the latter case, once a user-level thread is blocked, the thread is not resumed by the kernel; The decision is left to the user-space. Kernel creates a new scheduler activation to notify the user-level thread system about the blocking of the thread. User-level thread system can then decides to free the old activation from the blocked thread and decides which of its other thread to run now.

Evaluation
For the purpose of evaluation, authors use the original topaz threads, FastThreads and new scheduler activation based threads to compare the performance of their approach. The cost of user-level thread operations is very close to Topaz FastThreads (Few microseconds away) and is one to two orders of magnitude better than original Topaz threads. The up-call performance is factor of five worse than original topaz threads; Authors attribute this to the quick implementation in Modula-2+ (though they claim that recoding would get a reduction of factor of four). Authors used the N-body problem to further test the performance of new FastThread Implementation. The speedup of new FastThreads is almost linear and little better than original FastThreads. In case of reducing available memory, new FastThreads perform better than other two (almost constant till 50%).

Confusion
The critical section part was bit obscure specifically the recover part where thread running in critical section (when preempted) does two user-level context switches.

Posted by: Pradeep Kashyap Ramaswamy | March 2, 2017 03:14 AM

1.Summary
Scheduler Activations are the kernel abstractions of processors on top of which the user level threads are implemented without compromising on performance and flexibility of user threads.

2. Problem
User level threads are lightweight and highly flexible. But since the user-level threads are scheduled on the kernel-threads which are then scheduled by the kernel on the actual processors obliviously with respect to the state of the user level threads, the activities which need kernel intervention like page fault, I/O lead to the incorrect behaviour/ poor performance of the user-level threads. Although kernel threads do not suffer from the previously mentioned problem like the user threads, in general cases the performance of the kernel threads is much worse than the performance of the user level threads.

3. Contributions
The crux of the paper is the concept of Scheduler activations on top of which the user-level thread management system is implemented.
Each address space is assigned set of scheduler activations(abstraction to a processor/ virtual processors). The number of activations assigned to each address space could be varied by the kernel.
Every event in the kernel (like preempting a processor due to page fault) is notified to the user-level thread management by making the upcall through the scheduler activations. User-level thread management could then decide on which on its user threads have to be scheduled on the available activations.
User-level thread management communicates to the kernel about its processor requirements. Requests for more scheduler activations if necessary or relinquishes the idle processors. This helps kernel in making the processor allocation decisions.
To deal with the case of context switch when the user thread is performing a spin wait and also to avoid cases of deadlock, the thread which holds the lock in the critical section is allowed to continue till its out of the critical section using the user level context switch, after which the context switches to the context of the thread scheduled on the new activation.

4. Evaluation
The idea of scheduler activations was implemented by making changes to the DEC SRC Firefly operating system kernel and also by altering the FastThread systems to include the user side changes. It is shown that with the increase in the number of processors the speedup of operations observed in case of the new FastThread is better than original FastThread which is better than the Topez threads.

5. Confusion
Whenever kernel has to notify the user-thread management of an event, it is told that it preempts another processor from the same address space and uses it to notify about the event. Why could not it use the processor which is just been preempted for this purpose instead of preempting another processor.

Posted by: Sowrabha Horatti Gopal | March 2, 2017 03:01 AM

1. summary
This paper discusses a means of moving thread scheduling to the user level to facilitate better performance and flexibility.
2. Problem
Traditional thread support through the kernel lacks much of the performance and flexibility that can be achieved by user level thread support. Having all thread switching and control flow happen through calls into the kernel introduces significant timing overheads of kernel checks with every call. The kernel also generally cannot know any information about the available threads to be able to schedule them in any kind of sensible ordering such as around critical sections.
3. Contributions
The paper introduces scheduler activations which allows for user level thread scheduling. Each program is allocated some number of virtual processors which it can then schedule its own threads on. The program can switch between its own threads on however many virtual processors have been assigned to it without ever requiring that the program trap to the kernel. The program need only notify the kernel when requesting either additional or fewer processors be allocated to it. The kernel can then use this as a suggestion and not a requirement when deciding on where to allocate processor resources between programs.
4. Evaluation
They first show that their implementation of scheduler activations introduces only a small amount of timing overhead over an existing user level scheduler but both handily beat kernel based methods. They then use a multithreaded program to analyze the performance between methods. They show that using scheduler activations can yield from small improvements at low core counts, to significant speedups at higher numbers of cores. They also show it to perform better in situations where both memory is more limited or when the number of threads being scheduled exceeds the number of physical processors. I feel that it could have used additional benchmarks as the speedups seen might not be consistent for all multithreaded programs as locking behavior and interthread dependencies change.
5. Confusion
I’m not sure I really followed the exact semantics of a kernel vs user thread.

Posted by: Taylor Johnston | March 2, 2017 02:49 AM

Summary
The paper presented a kernel interface and a user-level thread package that together combine the functionality of kernel threads with the performance and flexibility of user-level threads.

Problem
Traditional Unix-like processes were designed for multiprogramming in a uniprocessor environment, they are simply too inefficient for general-purpose parallel programming; they handle only coarse-grained parallelism well. Taking advantage of thread parallelism is crucial to high performance computing.

However, neither contemporary kernel support nor user-level library code is satisfactory. As mentioned in the paper, kernel threads, just like traditional UNIX processes, are too heavy weight for use in many parallel programs. The performance of kernel thread is, according to the authors, inherently worse than that of user-level threads. On the other hand, it is difficult to integrate user-level threads with other system services due to a lack of kernel support for user-level threads.

Thus parallel programmers usually had to make a choice to use user-level thread for performance provided the application is uni-programmed and does no I/O or had to use kernel threads for less restriction but worse performance. The paper proposed the new scheduler activations interface to address this dilemma.

Contribution
The interface proposed has two major advantages. The first is that the interface tries to minimize the switching between user space and kernel and reducing the cost introduced by trapping.

Secondly, with user-thread information, the proposed scheduler is actually able to determine the priority of threads for the same application. With kernel thread, even though the system might have a way to assign different priorities among different processes, but it is hard to differentiate priority for threads for the same application.

The interface was also tuned to achieve other goals:
- Achieve best existing user-level thread management system performance when thread operations do not need kernel intervention
- When kernel intervene is a must, the interface makes sure that no processor idles in the presence of ready threads, no high-priority thread waits for low-priority threads, the processor can be used to run a different thread with a thread traps to the kernel to block
- Simplify application specific customization at user-level part of the system.

Evaluation
The paper evaluated the thread performance, upcall performance and application performance. By comparing FastThreads on Topaz kernel threads and FastThreads on Scheduler Activatons, the paper claimed that the proposed system preserves the order of magnitude advantage that user-level threads offer over kernel threads.

When it comes to upcall performance, the signal-wait time is 2.4 milliseconds, a factor of 5 worse than Topaz threads. The paper attributed this worse performance to implementation issues as they did not see inherent in scheduler activations that is responsible for this different. However, it would be more convincing if the author can do some implementation or experiment to address this point.

Confusion
1. Can you explain the concept of kernel-threads and user-thread in detail? What exactly are the differences between the two?
2. Looks like the proposed interface introduced various communication mechanism between kernel and user space. For example, like one thread gets blocked, the kernel takes away the processor and need to return the processor the process later. Several communication is introduced. Will these communications overhead neutralize the time saved by the implementation?

Posted by: Yunhe Liu | March 2, 2017 01:55 AM

1. Summary
This paper introduces scheduler activation. It is an efficient way to implement user-level thread, which provides the same functionality as kernel thread.

2. Problem
Traditional user-level thread can have poor performance. The underlying OS kernel still views user-level threads as one process, and controls the scheduling for each user process. One thread in process A blocking (e.g. due to I/O) will lead to kernel schedule another process B. In effect, all other runnable threads in process A is blocked too. This may lead to problem that CPU is idle in the presence of runnable threads (e.g. under I/O intensive workload).

Traditional kernel-level thread can have poor performance. Kernel controls the scheduling among threads. First, every switch between threads have the overhead of trap in and return from kernel. Second, kernel doesn't have enough information to determine which thread should be scheduling, and this information should be provided by user programs.

The authors claimed that kernel-level thread is just wrong (we could not get rid of overhead), we should use user-level thread, but with better kernel support, so here is scheduler activation.

3. Contributions
The contribution of this paper is show us what should be a good interface between kernel and user program in terms of thread scheduling. It's critical to notice that scheduling policy should not be made in kernel, because different user program may favor different scheduling policies among their own threads. This leaves us question about what kernel should provide. OS kernel should provide no more than the basic protection and sharing of hardware (CPU, memory, I/O device) among user programs. In terms of scheduling (sharing CPU), (1).kernel should leave scheduling policy to user program (by upcall into user-level thread scheduler); (2).When necessary, kernel should preempt CPU from one user program to another to guarantee sharing. What kernel should notify user-level scheduler is illustrated in Table II of paper. Kernel should notify scheduler that certain processor is added/removed from this user process, and certain thread of this user process is blocked/unblocked. With upcall into user-level scheduler, better scheduling policy can be made for threads in each user program. However, when kernel needs to reallocate processor from one user program to another, kernel doesn't know which processor belongs to which user program right now. User-level scheduler should provide this information to kernel promptly that it has idle processor or needs more processors. This consists of system call from user-level scheduler to kernel in Table III of paper. Moving certain policy (e.g. scheduling, page replacement, etc) out of kernel was later emphasized in Exokernel (1995).

In addition, how to handle deadlock of critical section (by allowing thread finish execution in critical section), and an implementation speedup to make copy of every critical section code during compilation is also discussed in paper.

4. Evaluation
The authors implemented scheduler activation on FastThreads package running on the Firefly. 1.The user-level thread operations (null fock and signal-wait) show that scheduler activation is much faster than kernel thread, and only a little slower than user thread. 2.Upcall performance of signal-wait for scheduler activation is a factor of five worse than kernel thread. The authors thought the reason might be implementation was not optimized. 3.For N-body problem (with enough memory, not I/O bound), the speedup for scheduler activation is larger than user and kernel threads as the number of processors increase. 4.For N-body (not with enough memory, I/O bound), the execution time of scheduler activation is smaller than user and kernel threads. 5.For two copies of N-body program running on 6 processor together, scheduler activation has best speedup. I think it would be better to verify scheduler activation still preserve sharing among threads of different user programs, if the authors gave experiment of fairness sharing among threads (many threads running on 1 or 2 processors).

5. Confusion
Does any current OS support scheduler activation? It seems that both FreeBSD and Linux supported scheduler activation (Kernel Scheduled Entities, and Linux Activations, respectively). Is scheduler activation popular or not? What is its drawback?

Posted by: Cheng Su | March 2, 2017 12:53 AM

1. Summary
The paper talks about user-level threads and kernel interface support that together provide good performance and allows flexibility in user-level management of parallelism.

2. Problem
Threads can be implemented at user-level or in the kernel. At the user-level, threads can be flexibly managed and are more lightweight in comparison to ones implemented in kernel. However, they can also exhibit poor performance due to real OS activities such as multiprogramming, I/O and page faults. There is thus a need to manage threads at user level with support from the kernel.

3. Contributions
The novel mechanism proposed in the paper is called scheduler activations. On kernel event, a scheduler activation directs control from kernel to thread scheduler which then uses the activation to modify user-level thread data structures and make requests to kernel. This basically allows user-level thread management with minimal changes. Moreover, scheduler activations can support any user-level concurrency model because the kernel does not have any knowledge of user-level data structures. Additionally, each application has a virtual microprocessor, an abstraction of dedicated physical machine. Each application knows the number of processors allocated to it and has complete control over thread running in them. This knowledge of number of dedicated processors available for application allows effective user-level thread management.

4. Evaluation
The design is implemented by modifying Topaz, native OS for DEC SRC workstation and FastThreads package. Null fork and signal wait benchmarks are used to test the approach. Scheduler activation mechanism shows better performance than original FastThreads. Next, upcall performance is studied which demonstrates low overhead due to communication between user space and kernel. Finally, to consider application performance, N-Body problem is studied which exhibits same performance as on original FastThreads.

5. Confusion
Could you explain the section 3.3 ‘critical section’ thoroughly ? I am not sure if I got everything right.

Posted by: Dastagiri Reddy Malikireddy | March 2, 2017 12:17 AM

1. Summary
Scheduler activations introduces a more cooperative model of scheduling than kernel threads or user threads could offer due to the additional exchange of information.

2. Problem
User threads and kernel threads trade flexibility for performance. User threads offer good performance if there is no I/O involved. Kernel threads offer more flexibility, but, due to a lack of application knowledge, may deschedule a thread and cause blocking. Can we offer a better performing model than this divide?

3. Contribution
Anderson et al. introduce scheduler activations. The key to this approach is the model where there is a user-level scheduler and a kernel-level scheduler. This two level approach allows for applications to perform a more optimal scheduling while the kernel-level scheduler provides fairness. Communication across the kernel boundary is minimized to a set of events/actions that can influence the allocation decisions. Importantly, the user-level threads must serialize its notifications to the kernel. They use some neat tricks like duplication of critical sections to minimize particularly slow paths.

4. Evaluation
The overall evaluation is a little lacking in depth. While it does demonstrate results, they only provide results for limited scenarios. One notable thing is that they do back of the envelope calculations to explain away the poor upcall performance. I don’t have a full grasp on how often system calls happen and consequently kernel intervention. How does this system perform on a more loaded machine where there are numerous threads. It would be nice to see the performance of I/O workloads and hybrids with CPU workloads.

5. Confusion
With the cease of significant generational improvements of processors, should we expect to return to this model soon as it seem we go in waves, pushing and pulling back ideas.

Posted by: Dennis Zhou | March 2, 2017 12:11 AM

1) Summary
This paper proposes a new user level thread scheduling mechanism, which combines the functionality of kernel threads with the performance and flexibility of user-level threads. Authors implement it as a user level thread package and strengthen kernel to provide kernel support for scheduler activation and processor allocation for address spaces.
2) Problem
Programmers face the dilemma of using user level threads or kernel level threads. User level threads have good performance when there is no OS interaction and are flexible, but can perform poorly when faced with OS activities such as page fault and I/O, or even cause mistakes. Kernel threads do not have these OS integration problems because OS directly schedules threads to processors, but deliver poor performance because they are too heavyweight. Besides, kernel threads are not flexible since changes to parallel programming model might often need modifying OS. User-level threads implemented on top of kernel threads have the same performance and flexibility problems with kernel threads.
3) Contributions
The authors demonstrate that kernel threads are inherently more expensive than user threads because of the extra kernel trap and verification for thread, and cost of generality. They also argue that user level threads’ poor performance in intensive OS activity is caused by lack of kernel support for notifying the address space of scheduling related OS events. Based on these insights, they propose to do user level kernel management with information of scheduling related kernel events. To achieve this, authors use a OS mechanism called scheduler activation, which is used as the execution environment for user level threads and also used to inform address space of kernel events like I/O and page faults, and to inform address space of thread wakeup (combined with upcall). The OS also keeps a one-one mapping between the scheduler activation and actual processor, which helps address space to keep track the number of processors allocated and running threads to make optimal local scheduling decisions. The OS is responsible for processor allocation to address spaces and it get hints by address space making kernel calls that it needs more or less processors than the current allocated number.
The authors also describe their method of dealing with critical section in user thread to improve performance and avoid deadlock. The big idea is to let the thread within critical section continue execution and preempt it only after it leaves the critical section and releases the lock. This is achieved by some binary translation techniques instead of expensive critical section flags.
4) Evaluation
Authors implement their user level thread package based on FastThreads and modify Topaz OS to provide kernel support. They first show that proposed thread package provide similar performance with original FastThreads in terms of thread operation and much better than kernel threads. However, the upcall performance poor and they argue this is an implementation problem that can be improved. Then they use the N-body benchmark to show that new FastThreads performs best both when there is little OS activity (similar to original FastThreads) and when there is heavy OS interaction (I/O caused by limited memory).
5) Confusions
I don’t understand binary translation method they use to deal with critical section. Why do they need to copy the critical section codes?

Posted by: Yanqi Zhang | March 2, 2017 12:09 AM

1. Summary
This paper examined a new way of interfacing between the kernel and user-level threads. This paper's technique allows for user-level thread performance while still allowing for the functionality of kernel level threads.

2. Problem
The problem with just using user-level threads is that they cannot schedule for real world events. However, kernel-level threads have an order of magnitude degradation in their performance compared to user-level threads leading to kernel threads going unused.

3. Contributions
The paper's solution for these problems is to use modified user-level threads with a kernel interface to allow for handling of I/O or processor reallocation. The kernel provides user-level threads with a virtual multiprocessor. The kernel can then change the amount of processors the virtual multiprocessor has but this allows for the user-level thread system to have control over which threads are run on the processors. It is up to the user-level thread to note if it needs more or less processors. The scheduler activations is the way the kernel controls communication to the user-level, which provides a mechanism to allow for blocking to schedule new threads. The paper also mentions how they have handled critical sections by using a recovery method that tries to allow a thread to continue through its critical section.

4. Evaluation
The paper evaluated their modified FastThreads compared to an unmodified version. At first the paper presents compelling data showing that their implementation suffers very little slowdown compared to the unmodified version. However, when looking at the overhead of kernel communication the paper's system performed at around a factor of five worse than the unmodified version. The paper tried to explain this away as just poor coding but if that was the case it might have been better to put more time in optimizing the code.

5. Confusion
I did not really get the reason for or how they implemented their optimization of the critical section.

Posted by: Brian Guttag | March 1, 2017 11:05 PM

1) Summary

The increasing number of processors in systems allows for increased parallelism in programs. However, taking advantage of the increased potential is difficult because kernel-thread-based threading libraries have high overhead that reduces performance improvement while user-mode threading libraries suffer from poor system integration. The authors introduce a new abstraction for program execution and scheduling which allows user-mode threading with good system integration.

2) Problem

Threading libraries at the time took one of two approaches. Kernel-thread-based approaches run program threads via kernel threads using the kernel scheduling mechanisms. However, context switches and scheduling require switching to kernel mode, which is expensive. Furthermore, the kernel does not have enough information to schedule threads well.

On the other hand, user-mode threading does all scheduling in a user-mode library. This allows the scheduler to be aware of the program behavior. Moreover, it eliminates context switching and other kernel overheads. However, user-mode threads have poor integration with the system, leading to poor performance for I/O dependent threads or threads that trap into the kernel often; such behavior blocks any thread from running in that process.

More generally, the abstraction of a process was designed to represent a single program with a single thread of execution. Unix-like operating systems have had to retrofit this notion with concurrency support. The authors argue that building a new abstraction with concurrency in mind is a better path.

3) Contributions

The authors propose two new abstractions. First, the OS allocates each executing program some number of virtual processors. This represents the number of concurrently running contexts a program can have. Using virtual processor allocation introduces a notion of multiprocessing into program execution. In contrast, a traditional Unix process assumed one thread of execution.

Second, the fundamental unit of scheduling is a scheduler activation. An activation serves as a context in which a thread can run and can be scheduled on a particular virtual processor. The user-mode library decides which threads execute in a given activation. The kernel notifies the scheduler of important events, such as preemption or I/O by creating a new activation and running the scheduler in it. The scheduler can then hand the activation off to a thread it wishes to run.

More abstractly, though, one of the contributions of this paper is to help us rethink what the fundamental unit of execution should be in a concurrent environment and what the operating system's role is. The authors decouple scheduling from time-sharing. While the kernel multiplexes virtual processors onto physical processors, the user-mode library chooses what to run. Moreover, virtual processors and scheduler activations directly expose to user-mode the notion of multiprocessors and multiple execution contexts in an elegant way.

4) Evaluation

The authors address a problem that still exists today, proposing elegant and straightforward abstractions. They replace the traditional notion of a process with something that is aware of and designed for concurrency. Their evaluation section illustrates the potential performance benefit of their abstractions. In particular, they show that the N-body problem can benefit a great deal.

However, I felt that the evaluation section was a bit lacking. The authors never explain why N-body is a representative workload of most concurrent workloads, and it is the only application they evaluate in most of their performance section. It seems that this workload may be biased in favor of the implementation. For example, a database application would be highly I/O, but the performance of kernel/user-mode switching in the proposed design is 5x worse than traditional systems.

5) Confusions

The authors mention that they need detect cases where a thread is in a critical section when it is preempted or blocked. To this end, they propose a rather complicated mechanism. An alternative would be for the thread to just set a flag somewhere when it enters and exits the critical section, but the authors claim that this is expensive.

I don't understand why, though. The flag doesn't even need to be atomic. It just needs to be good enough to avoid poor scheduling most of the time, right?

Posted by: Mark Mansi | March 1, 2017 09:34 PM

1. summary
This paper presents a new parallel scheme, called scheduler activations. It solves the delimma of using either user-level threads and kernel threads.

2. Problem
There are two ways for parallelism. One is kernel threads and another is user-level threads. However. Neither is fully satisfactory. Kernel threads are too heavyweight. The performance is an order of magnitude worse than that of user-level threads. In user-level threads, each process is treated as a virtual processor, but due ot OS activities, the virtual processors are not necessarily equivalent to the physical ones. This inequivalence can lead to poor performance or incorrect behavior.

3. Contributions
This paper overthrows the kernel threads by showing that kernel threads is inherently worse than that of user-level threads and argues that managing parallelism at the user level is essential to high-performance parallel computing. Therefore the paper proposes the idea of building a user-level thread package on the kernel interface which combines the functionality of kernel threads with the performance and flexibility of user-level threads. The difficulty of implementing the idea is that the necessary control and scheduling information is distributed between the kernel and each application's address space. The paper solves the difficulty by providing a virtual multiprocessor to each application and notifications between kernels and the user-level thread scheduler.

4. Evaluation
This paper evaluate three aspects of the new system: the cost of user-level thread operations such as fork, block and yield, the cost of communication between the kernel and the user-level, the overall effect on the performance of applications. The user-level thread of FastThreads on scheduler activation is nearly the same to the FastThreads on Topaz threads and have 1-2 orders of magnitude advantage over the kernel threads. The communication time is more than that of Topaz kernel thread operations, but this is due to the overhead of implementation instead of scheduler activation. For the overall performance, the new system has gains in speedup and memory usage.

Posted by: Huayu Zhang | March 1, 2017 09:27 PM

1. Summary
This paper proposes a new kernel interface with a user-level thread package that provides the same functionality as kernel threads while providing the performance and flexibility of user threads.

2. Problem
Parallel programs that want to execute tasks concurrently need to use threads. Threads can be supported by the OS kernel, but it comes with overheads associated with crossing of protection boundaries and kernel traps. The programs also have to use a general purpose thread system in the kernel which may not closely match the needs of the application causing additional overhead. User-level thread libraries provide better performance and flexibility, but integrate poorly with kernel threads, because kernel threads can block or be preempted without notification to the user level thread and they are scheduled without any knowledge about the parallelism available at the user level.

3. Contributions
Scheduler Activations provide each application with a virtual multiprocessor. The application knows the number of processors available to it and has complete control over scheduling threads on these virtual processors. Performance improvement over kernel threads is achieved by trapping to the kernel only for a small subset of thread operations that can affect the kernel's processor allocation decision. The OS kernel can change the number of processors available to the processor during execution. The kernel vectors events such as blocking or wake up of a user-level thread to the thread system instead of interpreting it itself. This communication is achieved through upcalls into the user space. The program can also notify the kernel when it needs more processors to run threads or when it wants to yield processors assigned to it. Poor performance and deadlocks that are possible due to preemption of threads in a critical section are avoided by identifying such threads immediately after preemption and running them again until they exit the critical section.

4. Evaluation
Scheduler activations were implemented with modifications to existing thread systems such as Topaz kernel threads and FastThreads, a user-level thread system. The cost of user-level thread operations and communication of the thread system with the kernel were measured. User-level thread operations with scheduler activations have a cost similar to their working on kernel threads. Communication with the kernel has a very high overhead which the authors attribute to the implementation used for the experiments, rather than something inherent to scheduler activations. Performance is close to FastThreads for compute bound applications while it is much better for I/O bound applications. The evaluation could have used various applications where the ratio of thread operations that require communication with the kernel to those that can be done at user-level could have been varied to demonstrate the break-even point for performance.

5. Confusion
The paper says that critical sections can be identified by checking the preempted thread's program counter. What if the preemption occurred during a procedure call after entering a critical section, which can change the PC to something outside the critical section?

Posted by: Suhas Pai | March 1, 2017 09:11 PM

1. Summary

The authors present new abstractions to move CPU scheduling related to threads out of the kernel, and allow user-level processes to make those scheduling decisions. These abstractions include virtual processors and scheduler activations. The former can be bound to processes, and don’t have to be given up even when a thread is blocked. The latter allows the kernel to notify user-level thread schedulers when a thread is blocked, and is useful in requesting changes in processor allocations.

2. Problem
Threads have become an important primitive for high performance parallel computing. However, the existing OS abstraction of a process which allows sequential execution, does not fit well with the parallel execution model needed by threads. Threads can be supported through kernel threads but are too slow. They can also be supported by user-level libraries which are fast, but do not integrate well with the system when kernel trap events occur. Eg. when a thread gets blocked for I/O, the kernel does not know that there are other runnable threads, and ends up blocking the entire process. The main problem is that the kernel does not have enough information to make scheduling decisions for various threads in a process. Hence the authors argue that new abstractions are needed to allow scheduling to be pushed to userspace, allowing the user-level process to make scheduling decisions.

3. Contributions
1. Virtual multiprocessors & Scheduler Activations - The processor abstraction can be bound to user space processes. The kernel notifies the user-level thread system about blocked threads This scheduler activations and upcalls. which allows the program to run a different thread without giving up the processor.
2. Virtual processor allocations - Certain thread operations can result in userspace processes requesting changes in processor allocations.
3. Critical section handling - In earlier systems, it was possible to have deadlocks when the thread holding a lock could get blocked. The authors handle critical sections via a recovery mechanism, where the blocked thread is allowed to run for a little while longer through a context switch and is blocked again when it exits the critical section.

4. Evaluation
Their design is implemented in Topaz, the OS that runs on DEC SRC Firefly workstations and the FastThreads user-level thread package. The authors use a few micro benchmarks to measure thread performance, and find that their modified FastThreads match the performance of unmodified FastThreads. They also use micro benchmarks to measure the overheads of upcalls, and find it to be five times worse than the overhead of kernel thread operations. They claim that this is because Modula2+ is used instead of assembly language, which would have reduced the overheads by 4X. They use the NlogN solution of the N-body problem to measure application performance. They find that scheduler activations help achieve performance as good as unmodified FastThreads, or even better in certain scenarios involving lot of kernel events because of multiprogramming.

5. Confusion
1. With kernel threads, the program must cross an extra protection boundary on every thread operation, even when the processor is being switched between threads in the same address space. What does this mean?
2. The authors argue that identifying whether a thread is in a critical section can be done through setting and unsetting of flags, but imposes too much overhead. Why is the overhead high for this?

Posted by: Karan Bavishi | March 1, 2017 09:04 PM

1. Summary
This paper introduces scheduler activation, a mechanism to manage threads in user level, that solves the dilemma between the little overhead provided by user level multi-thread libraries and the proper behavior provided by operating system kernel level thread.

2. Problems
There are two kinds of threads, user level thread and kernel level thread, each have their pros and cons. User level thread has low overhead because it is managed by a user level library and do not need kernel interaction, but kernel does not know the existence of multiple user level threads, so it might make wrong scheduling decisions. Kernel level threads are schedules just like a normal process, and kernel knows more details about kernel threads compared to user thread, which can help to make better scheduling decision. The problem of kernel thread is its large overhead. This paper proposes a method to combine the functionality of kernel thread and the performance of user level threads.

3. Contributions
This paper designs a new interface between kernel and user to provide better thread support. Kernel provides a number of virtual multiprocessors to user level thread system, and user library assign its threads to virtual multiprocessors. User can communicate with kernel to ask for more or less processors, and kernel can change the number of multiprocessors assigned to user. In most cases, threads are managed by user level library, but kernel can notify user library when a thread blocks in kernel (maybe because of IO) or resumes. Scheduler activation works like a context through which user and kernel can communicate. This paper also introduces how to decide the proper number of processors for user, how to deal with critical section. Using all these methods, this paper guarantees that no processor idles if there is work to do.

4. Evaluation
In the evaluation section, this paper first studies thread performance when there is little user kernel communication, and the result shows scheduler activation is much faster than other thread models. The second study is upcall performance to test if the communication between user and kernel will be an overhead, and the result shows a low overhead. The third study is real application, the result shows that scheduler activation can scale to multi-processors, and performs well even if there are many blocking IO operations or locking.

5. Confusion
How does this model handles the priority of different threads? Can the user library notify the kernel about its threads’ priority and kernel reallocate more resources? Or the priority is fully controlled by the user library, using the resource allocated by kernel?

Posted by: Tianrun Li | March 1, 2017 08:55 PM

1. Summary

This paper argues that user level management of parallelism should not be done on top of kernel level threads. The abstraction provided by the kernel threads is too heavy weight and inadequate to run user level threads on top. It then describes a new abstraction they have built which they call Scheduler Activation to provide the kernel interface for user level threads. They have also built a user level thread package that uses Scheduler Activations to implement parallelism. Scheduler Activations provide functionality of kernel threads while still preserving the performance and flexibility of user level threads.

2. Problem

To handle parallelism at application level, usually the user level thread libraries are used. These threads do not require kernel intervention hence the performance is excellent. Typically, these user level threads are built on top of the kernel threads which has implications. Since, there is one kernel thread per processor, if the user thread blocks, the kernel thread blocks as well and as a result the processor remains idle and gets wasted. Moreover, if more than one kernel thread per processor is used, the kernel will have to do the scheduling between these threads which is very costly and outweighs any benefits provided by the user level threads.

3. Contribution

This paper presents a new kernel interface and a user level thread package that combines the benefits of kernel and user level threads. It provides an abstraction of virtual multiprocessor to each of the application. The application then manages which of its threads run on each of those processors. The kernel and the user program only communicate to notify each other of the changes in their scheduling state. All of this is achieved through the abstraction called scheduler activations. Each Scheduler Activation is created for a virtual processor abstraction that the kernel provides to the application. The kernel also creates these activations to do an upcall to the application to communicate events. For example, when a scheduler activation is blocked, when a processor is preempted or when a new processor is added. The application similarly notifies the kernel when it has more runnable threads than processors, so that kernel can decide to allocate more processors to it depending upon the availability. On top of these processors, the user application can do application specific scheduling to improve performance. The authors added scheduler activations to the topaz kernel. The user level thread package was implemented by extending the FastThreads library.

To handle preemption in critical sections, the authors have implemented a copying technique where every critical section is copied and then if the preemption needs to happen while the current thread is in critical section, its pc is transferred to the copy which has a signallling code at the end of the critical section. As a result, the threads are never premempted while they are executing critical sections.

4. Evaluation

To quantify the benefits to the application, the authors measure the application performance for the same application using the original topaz threads, the FastThreads and the new FastThreads built using scheduler activations. The application they were using was a solution to the N body problem. In the first scenario, the application makes very little use of kernel and it is seen that the performance is as good as FastThreads. Actually slightly better. It can be seen that for the original topaz threads, the performance increase starts to diminish as the number of threads increases. This is because of heavy switching of kernel threads. In the second scenario, the application does perform I/O and it can be seen that their execution time is still better than the original FastThreads.

5. Confusion

Why did the authors need to implement the complex copying scheme for critical sections? The naive scheme with flags shouldn't have that much overhead for the common case.

Posted by: Hasnain Ali Pirzada | March 1, 2017 08:55 PM

1. Summary
Scheduler Activations are a way to manage thread operations at the user level while effectively decoupling them as much as possible from kernel-level thread decisions.

2. Problem
User-level thread management systems exist, but often perform poorly because they don’t have any communication with or affect thread scheduling decisions made by the kernel. User-mode threads have several advantages over kernel threads, including lightweight thread operations and customizability. How can the kernel-level thread management choices be exposed to the user in a useful way?

3. Contributions
The authors contribute the design of Scheduler Activations, a way to enable user-level management of threads that requires minimal changes to the kernel and user thread systems. When implemented correctly, it is also transparent to user programs.
The design itself identifies the components of kernel threads that can be migrated to the user space for effective thread management: essentially, awareness of the number of available physical processors.

4. Evaluation
The authors implement a prototype system by tweaking the DEC SRC Firefly OS from the kernel side and the FastThreads thread system on the user side to make user of scheduler activations.
They test the overheads of thread operations with null fork and signal-wait microbenchmarks. There is almost no contact with the kernel in these benchmarks, so scheduler activations performs almost identically to FastThreads.
They then test the overhead of upcalls using scheduler activations and find their system performs significantly worse than the conventional thread system’s syscalls. They attribute this to a lack of optimization of their system. I appreciated the fact that they reported this negative result.
They then test their system on a parallel application benchmark. They perform close to user-mode threads for an app with little kernel involvement, and better than the user and conventional thread systems when there is a lot of kernel involvement. This is because their user mode thread operations are more efficient than conventional threads, and their interaction with the kernel is more efficient than user-mode threads.

5. Confusion
In the discussion of thread priorities, they say that the user might ask the kernel to pre-empt the lowest priority thread running in that address space. How is this possible if the kernel isn’t kept updated about the priority status of the user threads? It seems like this would require the kernel to call into the user space before pre-empting one of its threads, which would be a circular situation.

Posted by: Mitchell Manar | March 1, 2017 08:53 PM

Summary:
The paper presents the design and implementation of a new kernel interface and a user level thread package that combines the system integration support of kernel threads with the performance and flexibility of user-level threads.

Problem:
User-level threads built on top of traditional processes exhibit poor performance or even incorrect behavior during I/O, page faults and multiprogramming. Kernel threads avoid these system integration problems exhibited by user-level threads, but their performance is bad due to the heavy cost of accessing thread management operations.

Contributions:
In the proposed design, the operating system kernel provides each user-level thread system with its own virtual multiprocessor instead of a virtual processor. Kernel controls the processor allocation to address spaces. Each address space’s thread system controls scheduling policy for its threads and its aware of the number of processors allocated to it. The user-level thread system notifies kernel when the application needs more or fewer processors. On the other hand, the kernel notifies the user-level thread system whenever it changes the number of processor assigned to it and whenever a user-level thread blocks or wakes up in the kernel. This communication between kernel and user-level thread system is structured in terms of scheduler activations. Besides, scheduler activations provide context for user-level thread execution and provide space in the kernel for saving the processor context of the current thread, when the thread is blocked or preempted. To solve the problem of poor performance or deadlock in case a thread is blocked/preempted in a critical section, thread is temporarily continued via a user level context switch.

Evaluation:
The implementation was done by modifying Topaz, the native OS for the DEC SRC Firefly microprocessor workstation and FastThreads, a user-level thread package. The authors compare their implementation with the systems that uses kernel threads (Topaz threads) and that uses user level threads (FastThreads). The proposed design performs as well as user-level threads with minimal overhead of 3-5 µs when there are no kernel events. The upcall performance in the new system is poor which they attribute to the implementation than to the design. The application performance does not really improve on new system as compared to FastThreads. In multiprogramming environment, new system provides speedup within 5% of that obtained when the application ran uniprogrammed on three processors. So the benefit of the new system is only that it performs as well as a user-level thread system and does not degrade during kernel activity.

Confusion:
How does I/O affects correctness in user-level threads?

Posted by: Neha Mittal | March 1, 2017 07:08 PM

CS 736 Reviews - Spring 2017

Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism

Comments

Post a comment