My notes in 2nd revision
========================

- Trade-off:
	+ normal execution slow (interception overhead)
	+ but cross domain RPC call is fast (no context switch)
		~ good when RPC is frequent
- Tradeoff between performance and level of distrust
	+ code in trusted domains can run at full speed
	+ only distrusted code incurs execution overhead
- Sandbox
- What types of apps can benefit from software-based fault isolation?
	+ where cross domain calls happens frequently
	+ small amount of code is distrusted:
		~ spend most of execution time in trusted code
	+ Example 1: vnode, forward every file system call to user-level fs
	+ Example 2:s user-programmable high performance IO
		~ user-level code manipulate incoming data
    + Example 3: application specific message handlers
		(recall download code in Exokernel)
		
- Why need dynamic instrument?
	+ some instruction cannot be statically verified
		~ jumps through register (used to implement procedure return)
		~ stores that use register to whole their target address

- Then What can be statically verified?
	+ control transfer instruction, like program-counter-relative branches
	+ stores to static variable, using immediate addressing mode
		~ address is there in the instruction
	
- What is the system model?
	+ divide address space into segments s.t. all virtual addresses within
	  a segment share a *unique pattern* of upper bit, called segment id
	+ A distrusted module has two segment: code and data
	+ REQUIREMENTs for a distrusted module
	 	~ jumps only to its code segment, 
		~ write only to its data segment,

- What is segment matching?
	+ piece of checking code before every unsafe instruction
	  (then who do you find unsafe instruction? Compiler technique)
	+ Steps: - 4 instructions:
		~ move target address to dedicated register
		~ extract the *segment identifier* of target address
		~ compare it to segment ID of distrusted code
		~ trap if not match, otherwise go ahead
	==> feature:
		+ big overhead (4 more instruction every unsafe store/jump )
		+ but can pinpoint where the fault occurs ==> useful for debug
		+ requires 4 dedicated registers
			Code, data, shift amount, segment ID
	      If not enough available registers --> need to spill, or may fail 	
	Question: if don't need information for debug, how can we make it faster?

- What is sandboxing?
	+ guarantee that the target address of store/jump is always of the 
	  distrusted module
	+ lost some debugging info, but faster than segment matching
	+ How: 2 instructions and 5 dedicated registers
		• clear segment ID of target address
		• set segment ID of target address to that of the distrusted module		
		
- What can malicious distrusted module can do?
	+ suppose malicious module in process P make a syscall, say close(fd1)
	+ OS thinks that P want to do a syscall, as normal, hence, approve
	+ But other trusted code in P may still accessing fd1

- So how can we deal with that?
	+ Solution 1: make the OS know about the distrusted module, and on syscall
	check the PC to check if it is from distrusted code 
		~ not backward compatibility, not portable
		~ need to change OS
	+ Solution 2: Well, if we cannot change the OS, we change the fault domain
		~ add trusted arbitration code that determine whether a particular
		  system call performed by some other fault domain is performed
		~ to transfer syscall into cross-domain-RPC

- Sharing between fault domains?
	+ see below
	
- Section 3.6? Why need verifier? What to verify?

- Vs. Nooks?
  	+ just provide fault isolation
	+ not deal with OS and driver code (general)
	+ not deal with recovery	

- Where is performance overhead:
	+ sandboxing code

- Compare to normal RPC
	+ no need context switch (since fault domain in same address space)
	+ no TLB miss ...
	+ no marshalling, extra copy ...
	
**EFFICIENT SOFTWARE-BASED FAULT ISOLATION**
============================================

# 0. Take away
--------------
- provide extensibility by incorporating extensible software modules
	+ using isolation
- but, the risk of malicious module

Question: how to improve reliability and provide fault isolation from 
distrusted module?
- Alternative 1: use hardware, i.e place each distrusted module in isolated
address space, and make cross domain call using RPC.
	+ provide fault isolation
	+ but slow, because of context switch overhead
	(e.g vnode, forward every file system call to user level fs, hence
	slow thing down)
	+ good if communication across domain is not frequent
- Alternative 2: software based fault isolation (proposed in this paper)
	+ fault isolation within a single address space
	+ load code and data of distrusted module into fault domain, 
	  a separate portion of the application address space
	+ modify object code of distrusted module to prevent it jump or write to
	  address outside the fault domain
	==> hence intercept very store/jump instruction

Features:
	+ get rid of context switch overhead
	+ but increase execution time (because store/jump is intercept)
	+ hence good where cross-domain communication is frequent

# 1. How they do it? Through a couple of binary instrument technique

But the general idea is:
- place the distrusted code in to a separate segment within application 
  address space
- intercept very load/store/jump instruction, make sure that it does not
  try to access address outside the segment 
  (by checking the segment identifier)    
	+ slow, because done in software

*SEGMENT MATCHING*  
------------------
- to pinpoint unsafe instruction (i.e instruction cannot be verified to be
within correct segment)
- immediate addressing is easy to verify, because the address is there
in the instruction
- jump through registers is harder to verify
- 4 instructions:
	+ move target address to dedicated register
	+ extract the *segment identifier* of target address
	+ compare it to segment ID of distrusted code
	+ trap if not match, otherwise go ahead

==> feature:
	+ big overhead (4 more instruction every store/jump)
	+ but can pinpoint where the fault occurs ==> useful for debug
Question: if don't need information for debug, how can we make it faster?

*ADDRESS SANDBOXING*
--------------------
- just need 2 instructions:
	+ clear segment ID of target address
	+ set segment ID of target address to that of the distrusted module

*Some Optimization* 
-------------------
- if register-plus-offset mode is used, just sandbox the register,
  hence save the instruction to compute real target address
	+ need to create guard zones at the top and bottom of each segment
(	E.g. store, value, offset(reg)
	Need 3 sandboxing instruction:
		~ one to sum reg + offset
		+ two to set the segment ID of the dedicated register
		==> optimization)
- treat stack pointer as dedicated register:
	+ sandbox stack pointer when *it is set* not when it is used
	+ hence optimize for the common case... (use of stack pointer)
- not sandboxing sequence of loops

# 2. How about if a distrusted module wants to make a sys call?
- Alternative 1: can modify the OS to know about fault domain
	+ use PC to determine fault domain, hence can restrict it access
- Alternative 2: use cross-fault-domain RPC

# 3. What about sharing?
- read share is straightforward, because load is not intercept
  hence distrusted module can read shared data in single address space
- what about write share? Use *LAZY POINTER SWIZZLING*
	+ shared memory region is mapped into every address space segment
	that needs access, by modify hw page tables
	+ shared region is mapped to same offset in each segment

- Why they do it?
	+ easy sharing, do not have to modify sandboxing code
	+ as distrusted code accesses shared memory, sandboxing code translate
	the shared address into corresponding address in the fault domain's
	data segment (by replace the segment ID)
- In system not allowing virtual address alias
==> use shared segment matching
use a register as bitmap, indicate which segment the fault domain can access

# 4. How to they implemented it?
2 ways:
- use compiler
- use binary patching
	+ compiler specific
	+ not efficient (really?)

# 5. RPC implementation: across domain communication
Challenge: how to make it fast?
- because access to outside call is intercept (by sandboxing),
how to make a call?
==> use *jump table*, each entry is a control transfer instruction to a legal
entry outside the domain
	+ because jump table use immediate address mode, hence by pass the sandbox
	+ jump table is stored in (read-only) code segment
		==> can only be modified by a trusted module
- stubs run outside caller and callee domain
	+ copy argument
	 	~ argument is copied directly (like LRPC), hence single copy
		~ because stubs are trusted, copy directly (i.e. bypass sanboxing)
		~ no marshalling and so forth
	+ manage machine state 
		~ must protect any registers that are both used in the future by
		  the caller and potentially modified by callee must be protected
		  (Wow, compiler, GMOD/GREF)
		~ save only registered designated by the architecture to be preserved
		> no need to save if no instruction in callee fault domain modify
		 register
	+ switch execution stack

- What about error, failure?
	+ what type? address violation
	+ what if a call to fault domain never return (say an infinite loop)
		~ hard to deal, suggest timer