Remzi's notes:
==============

* What is a MUST belief?
	- Belief that is directly implied by the code
	- E.g.: p->r implies p is NOT NULL

* What is a MAY belief?
	- MAY beliefs are cases where we observe code features that suggest a belief
	  but may instead be a coincidence
	- E.g: a call to "A" followed by a call to "B" implies a programmer may
	  believe A and B must paired with each other, but it could be coincidence

* How are MUST belief useful?
	- we can directly looks at may MUST beliefs to find contradiction
	- any contradiction implies a bug

* How are MAY belief useful?
	- MAY belief can be coincidence, we can use statistic to rank the result
	- How:
		+ assume all MAY belief are MUST beliefs
		+ looks for violations (errors) of these beliefs
		+ rank each error by the probability of its beliefs
		+ if we observe a particular belief in 999 out of 1000 cases,
		  may probably be a valid belief
		+ if a belief happens only once, it is probably a coincidence

* vs. formal verification?
	- cheaper
	- rules is in the code itself
	- formal verification: the specs may not completely true

* vs. dynamic system, like eraser?
	- Eraser: see a limited number of path, flag errors when a path is executed
	- can be complement

* What is "internal consistency" approach? When it is useful? When can it 
  lead to false conclusion about the code?    
	- Propagating MUST beliefs at one code location to related locations
	- Any belief conflict is an error
	- Detail:
	- general approach
		1) define rule template T
			e.g: NO <A> after <B>
		2) define valid slot instance
			e.g: A, B
		3) code actions that imply beliefs
			e.g: p->a ==> p must be not NULL
			     (p == null), p may be NULL or NOT NULL
		4) rule for combining belief
		5) rule for belief propagation
	- example:
		if (card == NULL) { print card->info }
	- When useful: like the example above
	- False conclusion when: 
		+ code can be consistent but may do something wrong


* What is "statistical analysis" approach? When it is useful? When can it 
  lead to false conclusion about the code?    
	- What: deal with MAY beliefs, since they can be coincidence
		+ rank the results
	- When it is useful:
		+ have enough data

* There are two keys to internal consistency, "inferring must beliefs" and
  "relating code"?
	- What is relating code? 
		+ Code related at an implementation level:
			• an execution path from action a to action b allows us to 
			  cross-check a's beliefs with b's.
			• We also can cross-check their assumed execution context and 
			  fault models, e.g., if b can fail, a can as well
		+ Code related abstractly
			• if a and b are implementations of the same abstract routine, 
			  interface, or abstract data type
				~ a and b must assume same execution context and fault model
				~ same argument restriction (for routine)
				~ same error behavior (for routine)
    - Why is it useful?
		+ Code can be grouped into equivalence classes of related code that must 
		share the same belief set. We can then propagate beliefs held in one 
		member to all members in the equivalence class. This gives us a powerful 
		lever: a single location that holds a valid MUST belief lets us find 
		errors in any code we can propagate that belief to.
		+ example:
			• if a and b are abstractly related, then some contradictions
				~ a exits with interrupts disabled, b with them enabled;
				~ a checks its first argument p against null, b dereferences p 
				  directly; 
				~ a returns positive integers to signal errors, b returns 
				  negative integers
    - In what way we can infer MUST beliefs? Two ways:
		1. direct observation
			• setting pointer to null --> state change
			• check if p is null --> an observation of state
			• after changing state, the programmer must believe the changes
			  took effect
		2. implied pre- and post-condition
			• division by z implies a belief that z is non-zero
			• deallocation of a pointer p implies
				~ p was dynamically allocated (pre-condition)
				~ p will not be used after the deallocation (post-condition)

* Statistical analysis leads to a ranked set of possible bugs. How does a 
  programmer/developer interact with such results? 3 ways:
	- augment with code trustworthiness
		• code with few errors is more reliable for examples of correct practice
		  than code with many
	- inversion property ???
	- exploit non-spurious principle ???
    - NOTE: i am not sure this is a correct answer...

* What are latent specifications, and why they are useful?
	- are features designed to communicate intent to other programmers and 
	  idioms that implicitly denote restrictions and context.
	- can be leveraged to filter results and to suppress false positives
	- Example:
		+ naming conventions: lock, unlock, alloc, free, release, assert, fatal, 
		  panic ...
			~ useful to flag potentially interesting function
		+ At a slightly higher-level, most code has cross-cutting
 	  	  idioms that encode meaning.
			~ error paths are commonly signaled with the return of a null 
			  pointer or a negative (or positive) integer. 
			~ help detect failure paths at callers, and error paths at callees.
		+ executable specifications: debugging assertion, panic, BUG
			~ help suppress error messages on such paths
		+ specifications can be completely transparent but shared across all 
		  code in a given domain
			~ null pointers should not be dereferenced
			~ circular locking is bad

* All of these analysis generate "false positives". What is false positive?
  and why do they occur?
	+ FP: a case that is not really a bug but the checker still flags it
	+ Why FPs occur?
		~ macro
		~ on panic/BUG
		
* Section 6, internal Null consistency
	- check-then-use  (more serious)
	- use-then-check
	- redundant-check
	- challenges: limit the scope
		+ 10 lines rule for contradiction errors. Why? From experience ...

* What does the security checker the authors build do?
  	- Check to find unsafe dereference in kernel code
  	- Checking if a pointer believed to be user pointer is dereferenced by 
	  kernel code
	- How: 3 rules:
		1. Any pointer that is dereferenced is believed to be a safe kernel 
		   pointer.
		2. Any pointer that is sent to a "paranoid" routine is believed to be a 
		   "tainted" user pointer.
		3. Any pointer that is believed to be both a user pointer and a kernel 
		   pointer is an error.
	- False Positive:
		+ kernel backdoors that check if they were called from user code or 
		  kernel code. In the latter case, they can safely dereference pointers, 
		  but such uses would be flagged by a naive checker
		+ Solution: 
			~ latent-specification: boolean flags to_user, from_user ...
		  	  to identify what context are in
			~ list of dereferencing functions

* Why is it important in an OS to do this type of check?
	- unsafe dereference can crash the system
	- worse, can give a malicious party control of the OS 
	
* What is "inferring failure"?
	- If the software check for return value (i.e. if it is NULL) most of
	  the time, and does not check for it in some cases, then probably those
	  unchecked cases may be errors
	- "We use a more robust approach by deriving what routines programmers 
	  believe can plausibly fail. Miss- ing checks on almost-always checked 
	  routines are good candidates for errors. Similarly, checks on almost-never 
	  checked routines frequently signal misunderstood inter- faces."
	- Examples: the worst error
		+ seg_alloc() returns a valid pointer on success, but an integer error
		  code on failure (not a NULL pointer)
		+ Hence, most of the time, the checking IS_ERROR( shp = seg_alloc(..))
		+ But there is one case that this checking is not follow
			~ it check: if shp == NULL, which is never happens
			~ hence the failure path is never execute
			~ hence chance of physical memory corruption and other bad things...

* Static analysis by its very nature goes over all code. Why is this good?
  When it can be bad?
	- Good: look at all code, and find bug .
	- Bad: no idea what piece of code is likely to run, hard to say which one
	  are important

* This type of analysis often doesn't appeal to those in the programming 
  languages community. Can you guess why?
	- No formal proof: it is just a solution that "it's kind of work". You may
	  find some bugs, but no proof that other bugs may exists, or proof that
	  you will find all bugs ...
	
* This paper is really based on one awesome insight, the thing you should
  remember long after you've forgotten these details. What is it?
	- Code itself can tell you what it should do
	- Don't need other specification
	- i.e. code itself is specification
	
Potential question: code it self is specification. Can you give some example
to support that claim.
		
	
BUGS AS DEVIANT BEHAVIOR
========================
*link*
http://wwwse.inf.tu-dresden.de/presentations/lec6.pdf

# 0. Take away:
- finding bugs manually is hard:
	+ too many line of code
	+ what rules to check for
	+ manually finding rules is hard? so don't
- hence, need:
	+ automation
	+ no priori knowledge
- high level:
	+ extract what code *believes* without priori knowledge, flag contradiction
	  as errors
	REMZI: code is specification itself
	+ basing on contradiction
		to find lies: cross examine
		any contradiction is an error
	+ basing on common behavior
- so how they do it?
	+ from source code, extract 2 kind of beliefs, then check for contradiction
	+ MUST belief: inferred from acts that imply beliefs code *must* have
		> dereference of pointer p --> p must be not null
		> unlock (l) --> l must have been locked
	==> then, check using *internal consistency*: infer beliefs at different 
	location, flag error if contradiction
	+ MAY belief: could be coincidental
	==> use statistics, and rank the result

- general approach
	1) define rule template T
		e.g: NO <A> after <B>
	2) define valid slot instance
		e.g: A, B
	3) code actions that imply beliefs
		e.g: p->a ==> p must be not NULL
		     (p == null), p may be NULL or NOT NULL
	4) rule for combining belief
	5) rule for belief propagation

*Some examples* (details here)
- null pointer checking
- security holes
- ...

*Strength*
- automatic
- no need to understand about the system (in a deep way)
- scalable (why? because this is static analysis, more explanation...)
	> inter and intra procedure
	> interface implementation

*Weakness*
- noise (from coincidences and imperfect analysis)
	Solution: 
	1) large samples: but what if you don't have enough samples
	2) rank error message: inspect those credible first
	3) human intervention
		> use *latent specification* (lock, unlock, free...) 
		to reduce the slot space
		==> what if "latent specification" is not strictly followed?
- False positive, due to
	+ *crash* path
	   if (!idle) 
		panic(..) // machine reboot
	   idle->processor = cpu;
	+ concurrency
 ==> solution: use latent specification and extended complier to handle 
special cases
- depends on template, but the template is quite limited
==> hence it is hard to find deep error (like in case of model checking
or formal verification)