**COOPERATIVE TASK MANAGEMENT without MANUAL TASK MANAGEMENT**
==============================================================

# 0. Take away
- threaded model vs. event-driven
- want to get the sweet spot
	+ easy task management (i.e automatic task management)
	+ easy to reason about concurrency issues
- make both code work together, in the face of software evolution
- solution: it all about *adapters* (or wrappers)

# 1. THE BASIC

# What is threads?
- General-purpose solution for managing concurrency.
- Multiple independent execution streams.
- Shared state (file, memory...)
- Pre-emptive scheduling
	+ interleave on uniprocessor
	+ overlap on multiprocessor
- Synchronization (locks, conditions, monitor)   / shared state
- automated stack management (hence, easy for programmer)


# Why threads are hard?
- shared data --> must coordinate when access, hence potential of:
	+ misuse: forget a lock --> data corrupted
	+ deadlock
- hard to debug: data dependencies, timing dependencies
- threads break abstraction: ???
- achieving good performance is hard:
	+ simple locking yields low concurrency
	+ fine-grained locking increases complexity
	+ OSes limit performance (scheduling, context switches)

# Alternative: "event driven" or "cooperative task"
- organize program as a collection of event handlers
- one execution stream: no CPU concurrency
- register interest events (callbacks)
- event loop waits for events, invokes handlers
- no preemption, or events handler yield as specific yield points
- handlers generally short-lived
- manual stack management: rip the code for certain task into event handlers
that run to completion without block

# Bad thing about "event driven"
- long running handlers --> non-responsive
- can't maintain local state across events (handler must return)
- no CPU concurrency

# Good thing about "event driven"
- easier to debug than thread: timing dependencies only related to events, not
internal scheduling
- problems easier to track down: slow response vs. corrupted memory
- faster than thread on single CPU 
	+ no overhead of locking and context switch
(but threads provide true concurrency)

# 2. A DEEPER COMPARISON

# 2.1 Task management
- Threaded model
	+ tasks access *shared state*
	+ preemptive scheduling
- Cooperative
	+ no shared state (hence easy to debug...)
	+ non preemptive, only yields control to other tasks at well-defined points
	(when doing IO) 
	==> inappropriate to exploit multiprocessor parallelism
	+ global state: 
	  ~ only need to restored when a task explicitly yield, 
	  ~ and can be assumed when a task resumes
	+ wrapper around IO lib function: 
	  ~ instead of blocking, initiate the IO and yield control to other task
	  ~ wrapper must range for its task to become schedulable when the IO done

# 2.2 Stack management
- Threaded model:
	+ automated stack management
- Cooperative:
	+ manual stack management
		> single task a broken into several procedures
    + A program is a collection of event handlers:
    + E.g.: a task involves: 1) Receive a message 2) read a disk block 3) reply
      ~ Receipt of the message is an event: one procedures handle that event
        and initiate the disk IO
      ~ receipt of disk IO result is an event: another procedure handles that
        event and constructs the reply message
  

# 2.3 IO Management
 support synchronous (blocking) or asynchronous (non blocking) IO

# 2.4 Conflict management
- Threaded model
	+ because of shared state, need synchronization mechanism like locks,
	semaphores, monitors
	+ hence, small atomic operation
- cooperative
	+ large atomic operation, because task only yields at well-defined points
	==> all code executed between too yield points atomically
	==> easy to build complex invariants
- serial (no yield)	
	+ not worried about shared state, because task run until it ends
	+ no explicit mechanism

# 2.5 Data partitioning

# 3. Stack management: pros and cons of the two (in deep)
# 3.1 Automatic vs manual
- Automatic (procedure-oriented)
	+ express each complete task as a single procedure in source language
	+ when task blocks (on an IO), its state is stored on procedure's stack
- manual
	+ rip the code in any given task into event handlers that run to completion
	  without blocking
	+ event handlers are invoked by event scheduler in response to event (IO)
	+ Step to initiate and IO
	  1. Event handler E1 schedules IO operation, but does not wait for reply
	  2. E1 creates and register a task-specific object called continuance C
	     with event-handling scheduler. 
	     C contains:
	     ~ reference to event handler E2 to be call when requested IO done
	     ~ return address, where E1 has left of the task
	  3. E1 return control to event scheduler.
	  4. When IO operation complete, E2 run, and is passed the continuance
	     It know where to return (from info in C) to continue the work
	+ example: a procedure P can make an IO --> it needs to yield when doing IO
		> rip it into P1 and P2
		> use of continuance: explicit mechanism to get around ()
		(see the paper example)
		P1:
			Look up item in memory
			if found return
			if not, create a continuance to
			 	>records P2 to be called when the event (IO happens)
				>save the return address on that continuance
			schedule an IO (event)
			Register the continuance with the event scheduler
			Return
		P2:
			is called by event scheduler when IO returns
			recover variables
			process info (like insert data read from disk to memory)
			return to the caller (p1), how? return address is in continuance

# 3.2 Stack Ripping
- cumbersome task for programmer
	+ deconstruct the language stack
	+ reconstructs it on the heap (the continuance)
	==> reduce the readability of the code
- worse when more IO calls per procedure, even worse if we have loop
- hard to debug: if some thing wrong happens with E2, 
  + shorter call stack, it shows only current event handler
  + ==> can trace through the continuance (bad if use *tail call*???) WHY
- summary: programmer deals with a lot of stuff (that handled by compiler
in automatic stack management)
	+ function scoping: a conceptual function ripped into 2
	+ automatic variables: variables once allocated on the stack now must
	  be moved to the heap to survive across yield point
	+ control structures: 
	+ debugging stack: must be manually recovered by tracing through continuance
- worse with software evolution:
  + function F may turn from *compute only* to potentially yielding
    ==> ALL functions along every path of the call graph from F to the root
    may potentially have to be ripped into two.

# 3.3 Hidden Concurrency assumption
- stack ripping (manual stack management) has huge advantage: yield-point
  is *explicitly* visible in the code
- automatic stack management: hide potential concurrency
 ==> hard to know which may yield and where to reevaluate local state
 ==> worse with software evolution: a function call does not yield today may 
	yield tomorrow
- For example:
	foo (){
		...
		bar ()
		...
	}
bar() may yield for IO, but in automatic stack management we do not know.
if bar() yield for IO, after the call return, we need to reevaluate local state
Solution:
+ static checking: leverage the compiler
  ~ Say foo calls bar
  ~ previously bar does not yield, hence foo is tagged *atomic*
  ~ but now, bar can block on IO, hence tagged *yielding*
  ~ check for: basic block that calls a yielding function must be tagged yield
    otherwise, issues a warning
  GOOD: detect at compile time
+ dynamic checking: use counter at run time, simpler than static checking
  Each block of code that depends on atomicity begins with a call StartAtomic()
  and end with EndAtomic:
  ~ StartAtomic(): increment the counter
  ~ EndAtomic(): decrement the counter
  When any function try to block on IO, yield() asserts that counter is zero...

Question: when need to rip? When local state of the calling function is affected
on the yielding behavior of the called function

# 4 Hybrid approach: its all about adapter
- use adapters to connect between two code styles
- enable project written in one style to incorporate legacy code written in
 others
- "threads" are scheduled preemptively
- "fibers" are scheduled cooperatively, i.e no preemptions
  + so fibers ~ event handler
- multiple fibers on a single thread
- scheduler runs on MainFiber (which schedules both type of code)
- code written in automatic management that expect to block for IO runs on
other fiber, and yield control back to MainFiber when it blocks
- compute-only functions can run on any fiber

(full example in section 4 of the paper)

Question:
- isn't that this approach is too complex? cost of building adapter?
- context switch has major overhead?
- for every cross call, you need to build an adapter ...

Requirement of Thread --> Event
- event may do an IO, but may schedule that IO and return immediately
- thread may block in case of IO

Requirement of Event call in to Thread
- Thread may block
- Event return immediately...

Adapter makes cross calling transparent