configuration and management What is system configuration? ------------------------------ - OS - library - executable - configuration file and windows registry Example of configuration file per application: port, ACL list, virtual host, # of worker processes, cache size whole system: /etc, .bashrc, crontab, /etc/inittab/, /etc/rc.d/rc Example of windows registry 200K registry keys was originally in INI files; later on, changed to centralized registry format HKEY_CLASSES_ROOT (HKCR) HKEY_CURRENT_USER (HKCU) HKEY_LOCAL_MACHINE (HKLM) A lot of information stored in the Registry Example: storage location, server url, user account, 30 day expiration (when did you start) how to open a file (which file to call), open-with list Q: which applications will be started when windows starts? Compare centralized configuration file with distributed ones + sandardized configuratio nstorage + entire registry can b backed up easily + update atomicity (Vista provides transaction style ...) - difficult to back up individual applications - a clear target for security attack - easy to have one software interfere with others Q: why do we need so many configuration entries? * extensibility, flexibility Bad impact of configuration ------------------------------ security issue e.g. spyware (4--5 spyware per machine; hkey_local_machine\software\microsoft\windows\currentversion\run) bad performance bad functionality hard for testing peerpressure: diagnosis mirage: testing ============================================== =====================PeerPressure============= ============================================== PeerPressure 1. previous work: Strider: find correct snapshot and compare Q: what is the problem of STrider? how to get a correct snapshot? solution: golden state is in the mass (does this sound familiar to you ;) steps: 1. run AppTracer to collect the set of registries that are used as input for the program (Q: can this be automated?) 2. canonical 3. get sample from gene bank 4. statistical analyzer 5. trial and fix ---- baysian ... bayesian estimation -_-! 1. intuitively what matters? N m (machines in N that has the same value as the suspect) c (cardinality of this suspect) t (# of suspect) 2. baysian P(S|V) = P(V|S)P(S) / P(V) = P(V|S)P(S) / (P(V|H)P(H)+P(V|S)P(S)) P(V|S) = 1/c [lack of samples] P(S) = 1/t P(H) = 1-1/t P(V|H) = m/N [based on the sample set AND assume all samples are healthy] ~~~~~~ adjust to P(V/H)=m+n/N+cn Potential problems and sources of false positives (1) large root-cause entry cardinality (2) what if the machines in gene bank have the same problem biased by auto managed/configured machines (you will see many samples in GeneBank have similar properties) (3) perturbed by tooo customized machines (all abnormal) (4) more than one entry contributes to the problem =================== Registry virtualization is used in Vista to mitigate the registry corruption problem ============================================== ================Mirage======================== ============================================== background about upgrade motivation ~~~~~~~~~~~~~~ 90% upgrading once a month or more often reason for upgrade: bug fix, security, new efature, etc. don't want to install upgrade 5% upgrade failure cause of failed upgrade ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * broken dependency (i need library x, you bring lib y) <--- environment related * important features are removed (API change) <-- environment related * buggy upgrade * incompability with legacy configurations (configurtion file format change, e.g., you has to add include to httpd.conf <-- enviroment related Mirage ~~~~~~ how to do smarter testing and deployment? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. clustering help select testing representatives 1.5 user-machine testing scheme 2. staged deployment (controllable order) is a balance between x and xx objective of Mirage ~~~~~~~~~~~~~~~~~~~~~~~ * reduce the # of testing * balance the deployment latency * reduce redundant report clustering: goal: ideal clustering; sound clustering (macihnes in one cluster behaves the same) protocol: several design choices (1) parallel send to all clusters or sequentially (2) if sequentially, what is the order (farest or closest first?) (3) when to start the non-representatives: immeidately or after all cluster's rep finishes how to cluster: environment includes: OS, lib., exetale, envionment vairable, configuration file / registry 1. how to get env. resource * instrument and intercept (process creation, rd/wr, getenv, ...) heuristics to differentiate data file and env file - longest common prefix - in system dir. 2. how to get resource fingerprint - version, hash, name, customized parser - clustering, diameter d (# of differenceS), keep adding, until ... user-machine testing decide which app.s are affected collect input, record output, compare output Q: what if the patch changes output (collect info from passed reprsentative machines) Q: what are the problems? (perf., non-determinism DeltaExecution) experiments: 1. mysql, bad when using default parser and problem is one line in my.cnf -_-!! 2. i cannot imagine how they handle registry problems with the experiments: small samples customzied parser parameter (diameter) sensitive