Process and Synchronization

Part 1 - Process

Mode, Space, and Context

Mode: user & kernel mode

Space

kernel space: system wide & shared by all processes

kernel segment (kernel code, data structure, kernel address map)

user space: per process & private

u area, text, data, stack, shared memory region

Kernel accesses kernel space and u area

User process can only access its own user space except the u area

Execution context

process context: kernel executes on behalf of a process, e.g., system call

system context: kernel executes system-wide task, e.g., disk or network interrupt handling

Mode and context are orthogonal

Process Abstraction

Process Context

User address space: text segment, data seg, stack seg, shared memory regions

Control information: u area, proc structure, kernel stack, address translation map(= page & segment table)

Credentials: UID, GID

Environment Variables: saved at the bottom of user stack

Hardware context: saved in PCB, which is, in turn, saved at the u area

User Credential

euid and egid affect file creation and access

file owner and group are set by euid and egid of creator

To grant an access to a file, process's euid and egid are compared to those of file

Real uid and gid affect the permission for sending signals

Sender's euid or uid must matches to uid of receiver

The u Area and the proc Structure

Process table, which is an array of proc structures, mainly maintains the relationship between processes

u area is the status of the process

Context switch is:

Saving current hardware status to the u area of the process being scheduled out

Restore hardware status from the u area of the process being scheduled in

my figure-1

Executing in Kernel Mode

Events that cause system to enter kernel mode

The System Call Interface

User function calls a system call, which is actually the C-wrapper function of the syscall. This wrapper:

pushes the system call number & args onto the user stack

invokes trap instruction, which

changes execution mode to kernel mode

transfer control to the system call handler defined in the dispatch table

Note: dispatch table has handlers for syscall, interrupt, exceptions, etc.

The system call handler:

copies syscall number and args from user stack to the u area

saves the hardware context of the process on kernel stack

system_call_dispatch[system call number] is executed

stores return value or error number to the predefined registers

restores hardware context from the kernel stack

returns to user mode

The wrapper function reads registers and returns to the user function

Interrupt Handling

Interrupt handler runs in system context and is not allowed to block

However, the time to run the handler is charged to the process currently running

ipl (interrupt priority level) of the system determines which interrupts can be raised at the moment

interrupts with priority higher than ipl are servered immediately

interrupts with priority lower than ipl are saved in a special register and get servered after ipl drops enough

Synchronization

Kernel data structure is shared by all processes -> synchronization of processes is necessary

Techniques

non-preemptive kernel: process running in kernel does not get preempted unless it relinquishes CPU volountarily

locking: when a kernel data structure needs to be maintained consistent across blocking operations, lock is placed on it

ipl: ipl is raised before executing critical section to prevent the interrupt handler from corruptting the data structure

Above techniques do not work for multiprocessor system. See chapter 7.

Process Scheduling

Prioritized preemptive round robin scheduling: Multi-Level Feedback Queue

Priority of process is determined by:

nice value

CPU usage: running process is getting lower and waiting process higher

Processes blocked on a resource has a kernel priority depending on the reason of sleep

kernel priorities are higher than any user process

Signals

Signals are generated in many ways:

A process sends a signal to another by syscall kill

Device sends a signal to processes connected to it

Kernel sends a signal to a process to notify hardware exception, seg fault, etc.

Signals are handled just before the process being scheduled to run

The fact that a signal is generated to a process is marked at proc structure of the process

Sleeping process waiting for an event which is expected to occur soon is not waken up by a signal

Sleeping process waiting for an event for which it is indefinite when to occur is waken up by a signal

-> the system call that puts the process sleep is interrupted -> EINTR set

New Processes and Programs

fork and exec

Merging fork and exec into a single system call is not a good idea

Process Creation

fork tasks: see page 41 of Uresh Vahalla book

fork Optimization

copy-on-write

pages of data and stack segment marked as read-only and copy-on-write

Attempt to write to pages either by parent or child process causes a page fault

Page fault handler makes a copy of the page for the parent or child

vfork(BSD only)

Parent process is blocked after vfork

Only address map is copied

Child run in the parent address space

Child should exits or exec shortly after vfork

When the child exits or execs, the parent is awaken

Invoking a New Program

Format of a.out file (assuming no shared memory nor shared library is supported)

Header: 32 bits

sizes of text, initialized data, and uninitialized data segments

entry point: the 1st instruction to execute

magic number

Text segment

Data segment(s ?)

Symbol Table

exec tasks:

Access permission check

suid & sgid handling

Copy exec args & environment variables to kernel space

Allocate swap space for data and stack segments

Free old address space and swap space

Allocate address map

Set up address space(text, data, stack segment)

Copy exec args and evrionment variables from kernel space to the stack segment

Initialize signal handler information

Initialize hardware context

Process Termination

Awaiting Process Termination

Zombie Processes

When a process becomes zombie, only proc structure is maintained

proc structure is freed by wait call of parent process

init process calls wait for orphan zombies

If child is dead, but parent is still alive and does not call wait, the child remains at zombie state until reboot

Part 2 - Threads

Introduction

Motivation

fork is an expensive system call

A process must use interprocess communication to communicate with another process

Multiple Threads and Processors

Concurrency and Parallelism

Parallelism: actual degree of parallel achieved & limited by the number of processors

Concurrency: application's maximum parallelism with unlimited resource

System concurrency

Provided by kernel and provide true parallelism

Because some resource should be allocated to each thread, maximum concurrency is limited

User concurrency

Provided by user library and provide natural programming model for concurrent application

Could not provide true parallelism

Many systems provide dual-concurrency model that combines system and user concurrency

Fundamental Abstraction

Process = a set of threads + a collection of resources

A thread = a single execution sequence

Kernel Threads

Created and destroyed as needed internally by the kernel

Need not be associated with a user process

Resource

Shared kernel code and global data

Private kernel stack

Private space for storing register context, which would be stored in PCB of U area in ordinary user process

Lightweight Processes (LWP) - Kernel-supported user thread

Each LWP is supported by a kernel thread (Fig 3-4)

Resource

Shared user code & data segment, kernal code and global data

Private kernel stack & space for kernel register context

Additional space for storing user state, mainly including user register context

Pros

True parallelism

Cons

Most LWP operations such as creation, destruction, synchronization, etc. require system call

Synchronization between LWP is expensive

If a thread tries to hold a lock which is currently unavailable, the thread would be blocked

Blocking and resuming an LWP requires kernel involvement which is as expensive as system call

Each LWP takes kernel space and this limits the maximum number of LWPs

System has a single LWP implementation, which should be very general to support various application -> too general and heavy weight for most applications

Switching between LWP is still expensive

A user can monopolize by creating many LWPs

User Threads

User level library package such as C-thread of Mach or POSIX pthread provides functions such as:

creation, synchronization, scheduling, and destruction of user threads

User thread can be implemented on top of process or LWP

Kernel is unaware of user threads and only schedules in terms of processes or LWP (Fig 3-5 (a) & (b))

User thread becomes possible because

a process can have multiple execution contexts and switch among those contexts using setjmp, longjmp, and signal

the execution context can be saved and restored at the user level as Condor does

Resource

Private user stack, user-level register context, and other state info like signal mask

Synchronization at user level eliminates the thread switch at kernel level

When a thread needs to block on a lock, thread library puts the thread into the corresponding queue and schedules another user thread

Blocking system calls

A user thread makes a blocking system call ->

Replaced the call to non-blocking calls ->

Another thread gets scheduled ->

The non-blocking system call completes -> interrupts occurs ->

The blocked thread gets scheduled again

Pros

Performance (Table 3-1)

Virtually unlimited number of threads -> the number of threads = application's concurrency level -> better programming model

A system can have several thread libraries each of which can be used for different purposes

Cons

No true parallelism without support from kernel thread

No protection among threads within a process or a LWP -> threads should be programmed as cooperative

Information barrier between library and kernel

A thread (on an LWP) holding lock can be scheduled out by kernel to schedule in another thread (on another LWP) waiting for the lock

A thread (on an LWP) of high priority can be scheduled out by kernel to schedule in another thread (on another LWP) of lower priority

Lightweight Process Design - Issues to Consider

Semantics of fork

Should fork copy all LWPs in the parent process or only calling LWP

Only the calling LWP

What if the child process(= calling LWP) tries to acquire a lock held by a thread in another LWP?

All LWPs

If an LWP is blocked on system call, the status of the LWP in the child process becomes undefined

Many systems provide both implementation and let programmer choose the appropriate one

Other System Calls

File operation must be synchronized

User id can get changed by one of threads, so kernel needs to check permission on every system call

Signal Delivery and Handling

To which thread, is a signal delivered?

to every thread, to a master thread, to an arbitrary thread, or to a thread designated to handle each signal

SigHandler per process or per thread?

Better to share signal handler among thread

Sigmask per process or per thread?

Better to have individual sigmask so that each thread executing a critical section may setup its sigmask

Visibility

LWP should not be visible outside of the process but should be visible among threads within the process

Stack Growth

Kernel should not try to extend thread stack because user thread library will extend the stack

Thread specifies the size of its stack ->

Library allocates the stack in heap area -> sets write protection of the page just off the limit ->

Stack overflow -> Kernel signals SEGV ->

Library extends the stack

User-Level Threads Libraries

The Programming Interface

Library should provide as many functions as possible so that kernel interaction could be minimized

Implementing Threads Libraries

How to bind user thread to LWP

Bind each thread to an LWP: one-to-one between LWPs and subset of threads

Multiplex threads on a set of LWPs

Mix the two method giving higher priority to bounded threads

Schedule Algorithm (Fig 3-7)

Threads moves between 'blocked', 'ready', and 'running'

As many threads as LWPs from the ready queue are selected and become 'running'

'running' threads are currently attached to LWPs

It is the kernel's job to decide which LWP to run

Scheduler Activations

Combines the advantages of kernel and user thread

Basic principle: have close integration between user threads and kernel

Clear mission allocation

Kernel: processor allocation

Library: schedule threads by multiplexing them on processors given from kernel

Library -> Kernel: information that affects processor allocation

request for processors

relinquish of processors

Kernel -> Library: information that affects thread scheduling

allocation and deallocation of processors

status changes of threads, which affects scheduling

upcall: call from kernel to library

scheduler activation: virtual processor given to library by kernel

Logically similar to LWP: has user & kernel stack

Main difference from LWP: one-to-one relationship to processors

Handling of blocking operation of threads

Thread running in kernel blocks ->

Kernel preempts the processor(=activation) ->

Kernel creates a new activation and make an upcall to library ->

Library saves the context of the deactivated thread from the old activation ->

Library schedules another thread on the new activation ->

Blocking operation completes ->

Kernel makes a new activation (or preempts an activation from the process) ->

Kernel notifies the library of the blocking operation having been completed (and preemption of the activation) ->

Library puts the thread having been blocked (and the thread of preempted activation) into ready list ->

Library select a thread from ready list to run and attaches it to the new activation

Multithreading in Solaris and SVR4

My fig inserted in page 65

Kernel Threads

Kernel is organized as a set of preemptive kernel threads

Some kernel threads run on LWP, while others execute internal kernel functions

Threads can be prioritized

Lightweight Process Implementation

Each LWP is bound to a kernel thread during its life-time

Traditional proc and u area of process model should be changed

proc: hold all per-process data, including the process-specific part of traditional u area

lwp data: per-LWP data

user level register context: meaningful only when the LWP is not running

system call arguments, results, and error code

signal handling information

resource usage and profiling data, virtual time alarms, user time and CPU usage

pointer to the kernel thread attached

pointer to the proc structure

All LWP share signal handlers but each may have individual sigmask

Traps--synchronous signals like segfault--are delivered to the causing LWP

Interrupt--asynchronous signals can be delivered to any LWP that has not masked the signal

User Threads

Thread library hides details about LWP and most application writers deal solely with user threads

Thread library creates pool of LWP and multiplex all user threads on them. The size of the pool depends upon

the number of processors

the number of user threads

User can overrides the default

specify the number of LWP to create

request to dedicate an LWP to specific user thread

Hence bounded & unbounded threads

User Thread Implementation

Per-thread information

thread id to facilitate communication--for example by using signals--between threads within a process

saved register state

user stack

signal mask

process-relative priority

thread local storage: ex) private space for errno for each thread

Synchronization of threads in different processes

by placing synchronization variable such as mutex in shared memory, or

by writing synchronization variable into a file and doing mmap the file

Interrupt Handling

The traditional way of interrupt handling is not appropriate

Interrupt handling routine raises the interrupt priority level high enough to protect the global data it is accessing from being accessed by another interrupt handler ->

Happen to block so many interrupt occurring in multiprocessor system, while the probability of the data being accessed by those would-be-handled interrupts is extremely small

=> Solaris approach: semaphore or stuff like that

Create a pool of kernel threads each of which handles the interrupts of each level

Use ordinary synchronization variable to prevent global data from being accessed by another handler while a handler is using it

System Call Handling

fork: copy every thread of the parent process to the child process

fork1: copy only the calling thread

Part 3 - Synchronization

Race Conditions

Special kind of bug which occurs only in certain timing conditions

Semaphore

Semaphore is an object with:
private int value; // can be intialized, incremented, and decremented but never becomes negative
Semaphore (int init_value);
public up(); // atomic operation to increase the value
public down(); // atomic operation to decrease the value if value > 0, otherwise the caller is blocked

Bounded Buffer Problem

Implementation by binary and integer semaphore
Binary semaphore for accessing the buffer
Integer semaphore for empty( = 10) and full( = 0) conditions
Producer
empty.down();
mutex.down(); // exchanging these 2 lines causes deadlock
insert;
mutex.up();
full.up();
Consumer
full.down();
mutex.down(); // exchanging these 2 lines causes deadlock
delete;
mutex.up();
empty.up();

Dining Philosophers

Implementation: deadlock possible
Forks: mutex semaphores
Philosophers
take_forks(i): fork[i].down() -> fork[i+1].down()
put_forks(i): fork[i].up() -> fork[i+1].up()
Solutions for deadlock
Lower numbered fork first: safe but not as concurrent as possible
The situation where only one philosopher is eating while others are waiting forks is possible
Odd number fork first: better than the above
Safety & Liveness: As live as possible while not violating safety
Safety: no 2 adjacent philosophers are eating at the same time
Liveness: no philosopher is hungry unless one of her philosopher is eating
take_forks(i) {
mutex.down()
state[i] = HUNGRY
if(neither of neighbors is EATING) {
state[i] = EATING
mayEat[i].up() -- this is the mutex for eating
}
mutex.up()
mayEat[i].down()
}
put_forks(i) {
mutex.down()
state[i] = THINKING
if(left neighbors is HUNGRY & her neighbor is not EATING) {
state[i-1] = EATING
mayEat[i-1].up()
}
if(right neighbors is HUNGRY & her neighbor is not EATING) {
state[i+1] = EATING
mayEat[i+1].up()
}
mutex.up()
}

Monitors

Semaphores are rather low level and error prone: need careful program, otherwise race condition or deadlock
Monitor
Can be thought as an object wrapped by mutex.down() & mutex.down()
Condition variable instead of integer semaphore
Diff between condition variable & (integer) semaphore
up() Vs. signal()
up: increase the semaphore value -> excess up operations get remembered
signal: wake up a process waiting for the condition variable, no effect if no process is waiting
down() Vs. wait()
down: block the caller only if value == 0
wait: always block the caller until next signal(), also release mutex
Bounded Buffer - in Java syntax
synchronized insert(object item) {
if (buffer is full)
nonfull.wait();
addElement(item);
nonempty.signal();
}
synchronized delete() {
if (buffer is empty)
nonempty.wait();
deleteElement();
nonfull.signal();
}
Notes on signal
When a waiter is awaken, the condition it was waiting for is guaranteed ->
No process should sneak between signal() and return from wait() ->
Signaling mechanism should be absolutely reliable & Only one waiter should be awaken & The signaler should block immediately ->
Upon waiter leaving the monitor, signaler should be awaken or be able to contend to grab mutex released by the waiter
Inefficient: unnecessary context switches signaler->waiter->signaler
Notify
Wake up a waiter, but signaler does NOT block ->
Other processes can sneak into between signal and the return from wait call ->
Waiter should check whether the condition is still true after returning from wait call
'if' block of the above code snippet should be replaced by 'while'
Monitor in Java
No monitor modifier for the entire object. Instead each method can be tagged with 'synchronized'
Only one implicit condition variable per object -> notifyAll instead of notify

Messages: another paradigm for synchronization

No more hassle about race condition, but a new problem of how to make processes cooperate
Deadlock problem is still there
Naming: direct or indirect
Synchronization: usually sender is non-blocking and receiver is blocking, but both are blocking in rendezvous
Buffering
Message size

Deadlock

Three ways of dealing with deadlock
Detection and recovery
Prevention
Avoidance

Detection

Any cycle in wait-for-graph, then every node in those cycles and nodes which (directly or indirectly) wait for those nodes are involved in the deadlock

Allocation Model

Resource allocator has multiple instances of a resource type

Processes may request multiple resources per type

Algorithm

for (i = 0; i < # of processes ) {

find a process that hasn't finished yet, but can get everything it needs

if couldn't find, return deadlock

mark the found process 'done'

# of available resources of each type += # of resources of that type given to the process which has been marked as done

}

return no deadlock

Recovery

How to recover?

Reboot

Kill a process

Take resources away from a process

May need periodic checkpoint & log -> roll back

Which process should be rolled back? Youngest with aging!

When should we check for deadlock?

When a process becomes blocked because it's resource request can't be granted immediately

When CPU utilization drops

A process has been blocked on a resource too long

Periodically

Prevention

Conditions for deadlock

Mutual exclusion

No preemption

Hold and wait

Cycle - preceding conditions are impractical to get rid of. This condition should be attacked

Hierarchical allocation

When a process requests resources, the requested resources must all have numbers strictly greater than the number of any resource currently held by the process

Even number first dining philosopher algorithm: forks are numbered 0 or 1!

Avoidance

Each process declares its maximum resource request -- credit line

When a request for a resource comes in

Mentally grant the resource to the process

Assume that all processes suddenly request all the remaining credit line

Run the deadlock detection algorithm

If deadlock is detected, allocating the resource is unsafe. So do not grant the resource

Implementation of Synchronization

Implementing Monitors using semaphore

Implementing Semaphores using critical section

Implementing Critical Sections using a hardware support

Short-term Scheduling

Metrics

Waiting time - the amount of time the job is runnable and not running

Penalty ratio - elapsed-time / ideal-time

FCFS

Favors long burst, i.e., CPU-bound jobs

Short job's penalty ratio is too big

Favoring long job leads convoy effect

Convoy effect

short jobs are waiting behind a long job

CPU busy & I/O devices idle <-> CPU idel & I/O busy

Shortest Job First

Optimal with respect to average waiting time

Don't know how long the job will run -> Approx. algorithm needed

The length of burst is guessed from history

guess = ( guess + previousBurst ) / 2

Shortest Remaining Job First

Preemptive version of SJF

Min ( shortest job in the ready queue, remaining time for the job currently running to finish )

Round-robin & Process Sharing

Round-robin becomes process sharing when quantum becomes zero

Priority scheduling

Both preemptive and nonpreemptive scheduling are possible

Priority can be given externally or calculated internally

Multi-Level Feedback Queue

Short jobs are favored

Long jobs get longer quantum and run with less context switch

Analysis

Queuing theory: arrival rate vs service rate

Part 4 - Other Papers

Lightweight RPC
When the client and server are at the same machine
use shared memory instead of message to pass argument between client and server of RPC
inject client thread into the server's address space: very light weight context switch
MESA Monitor
Pretty much like Java monitor
Communicating Sequential Process
Message passing and rendezvous as a fundamental programming primitive