fault --> error --> failure detection survival/tolerance Q: can you think of ways of tolerance? buf-overf W Q: ... buf-overf R Q: ... dangl ptr double free mem. corrup. uninit. read F.O. & Rx: when error happens, start recovery How to detect? How to recover? ================ How to detect? J.K. algorithm ... problem: huge slow-down Q: any idea how to speed up? CCured, memory pool ================ How to recover F.O. read<-- return random, pre-allocated write<--- discard the read Q: alternative solution? write to padded return ERROR code and stop the current function Q: what is the disadvantage? (unpredicatable) implementation: compiler instruments the source code Experiment: apache, pine, ... (error message, not handled anyway, not important error) Q: when would it (not) work? short propagation chain (sci. comp. not work) availability is much better than re0start Martin Renard's example of flight advantage of failure oblivious computing: i. not as much addressing exception ii. prevent many attacks iii. some times, when you are lucky, following execution will not be affected, or maybe no visible failure smptom will be seen ==================== Rx (the story, target at improving F.O.) (they story starts from Peter's environment ...) Motivation: predictable, no extra bug environment: mem., scheduler, message double free & dang. ptr <-- delay free buf ov. <-- add padding (this does NOT work for static or stack buffer!) mem. cor. <-- change buffer layout uniniti read <--- initialize buffer to all-0 data race <-- change process priority and re-execute in the worse case: drop user request How to achieve this: checkpoint and rollback env. wrapper (what cannot be handled (stack)) Details of ckpt (leverage fork) Results: apache (stack-ov) has to drop request )why? compare with f.o.: 1. do not change the original semantic!! [key idea. only change environment, not program!!!!] 2. does not guarantee go through either the allergy/bug-triggering-environment concept!!! What are the disadvantage of Rx? 1. might be slower at recovery than F.O. 2. certain bugs cannot be recovered 3. cannot recover too old bugs =================== More details of Failure Oblivious computing and J-K algorithm Detailed algorithm: JK-RL () JK: dynamic bug detection. have a table keep the range of every memory object for every memory calculation, make sure that the source and destination pointer belogn to the same range if you have a out-of-bound pointer stored somewhere, that is ok. mark that as invalid. dereference would cause error lam: the problem with jk is that what if out-of-bound is used again to calculate correct in-bound? a[10]; p=a+20; q=p-15; solution: each range has a special area. info. about e.g. p is stored there - case study 1. pine: heap overflow (copy data to heap object, inserting `\' to quoting characters, badly assumed the destination heap size increment), original impact: always crash new effect: no visual impact (the overflowed part is trunked. however, that part is not displayed any way) perfor: ok (interactive) 2. apache: stack overflow (only space for keeping 10 catches) originaly effect: child process dies, restarted, could lead to attack new effect: perfect (turns out that, the later on part does not process more than 10 patterns any way) might get better performance, because of no need to kill and start new processes 3. send mail stack overflow (TODO: what is its cause?? 4.4.1) old effect: attacked; or exit at initial step new effect: go through that part, later on length checkign part find the xx too long and handle it through `lengh-too-long error handling function' 4. midnight commander file mgmt tool stack overflow new effect: print error message, saying file path lookup failure (for symbolic links in your tar-ball). continue to handle other commands 5. mutt mail user agent heap overflow (UTF-8 to UTF-7 problem again, same problem in squid) new impact: mail server returns saying the directory is invalid Performance Q: when do you think the performance would be bad, for what type of workload? Correctness: what type of workload would make sense (mentioned by the author) ==== extension: Emery's work Triage