CS 736: Advanced Operating Systems

CS 736 – Spring 2006

CS 736 – Fall 2006

Midterm 2 Review

Performance

Papers:

i. AFS

ii. Frangipani / Petal

iii. Chubby

General Problems

i. Latency: do things faster

1. E.g. RPC turn around

ii. Throughput

1. Handle more requests/operations per second

iii. Scale out:

1. run on bigger data sets on more machines

2. Handle more data on more computers / faster computers

3. Run well on a cluster with a billion clients

General Solutions

i. Locality

ii. Partitioning – distribute load to multiple servers

1. AFS

2. Petal

3. Frangipani

iii. Replication – more read throughput

1. AFS

2. Petal

iv. Caching

1. AFS

2. Frangipani

3. Chubby

v. Change data structures

1. Logging in Petal, Frangipani

vi. Batching – reduce startup costs

1. Delayed write / group commit in LFS / Frangipani, NFS

vii. Callbacks / leases – reduce server load

1. AFS

2. Frangipani

3. Chubby

viii. Move work to client

1. AFS name translation

ix. Notifications – notify other participant of semantically interesting events

1. AFS callbacks, Petal locks

x. Multi-level policy

1. Petal global / physical maps

Reliability

Terms:

i. Fault = code bug

ii. Error = memory corrupted by that bug

iii. Failure = system misbehavior

metrics

i. MTTF = mean time to failure == reliability

ii. MTTR = mean time to repair

iii. Availability = MTTF / (MTTF + MTTR), measured in 9s: 0.9, 0.99, 0.999

General design principles

i. End-to-end design

1. If code has to handle a problem anyway (middle layers can�t completely solve a problem), it is a good place to handle the problem completely

ii. Recovery-oriented computing

1. It may be easier to let things fail and recover quickly than to make them perfect

a. Improve MTTR instead of MTTF

iii. Transactions

1. Provide a general purpose error handling mechanism by abort

Failure Models

i. Timing – miss a deadline

ii. Output – produce incorrect output

iii. Omission – skip an output

iv. Crash – skip an output, produce no more output

v. Byzantine

1. Anything can happen, including malicious behaviors

General Approaches

i. Fault Avoidance

1. Prevention: make sure bugs never enter code

2. Removal: Remove bugs from existing programs

3. Work-around: don�t execute buggy code

a. E.g. Fire walls

ii. Fault Tolerance

1. Redundancy – execute multiple times

2. Diversity: multiple versions for deterministic bugs. Can also be diverse environment (change how memory allocated, scheduling works)

3. Isolation: confine errors to a single component

4. Modularity: keep components small

5. Error detection: why important if have isolation?

a. A: Needed for availability if doing wrong things

6. Recovery

a. Forwards / Backwards

b. Concealing / revealing

7. Where do you provide fault tolerance

a. In the application?

b. In a library

c. In the OS

d. Around a component (e.g. Nooks)

e. In the HW

f. If everything above layer X is identical, can tolerate faults at X or below automatically

g. If have some diversity above X, can tolerate heisenbugs above layer X

Systems

i. Process Pairs

1. All work done in persistent transactions

2. Process 1 sends request to its pair, tries to do work. On failure, transaction aborts and process 2 retries.

ii. Nooks

1. Isolating device drivers

2. Recovery by restart

iii. Recovery Oriented Computing

1. FIG: inject system call failures to determine bugs in error handling

a. Goal: gracefully handle failures

2. Recursive restartability: allow an application to be partially rebooted

a. Goal: improve MTTR

3. Undo for Operators: log system state & mgmt operations, allow for rollback, repair, replay

a. Goal: improve MTTR of management

iv. QuickSilver

1. Goal: transactions to simplify error handling in a distributed system

2. Optimized for fast cases: read only, volatile data

3. Common TM, LM per system

Security

Key threats:

i. Privacy – uncontrolled release of sensitive information

ii. Integrity – uncontrolled change to sensitive information

iii. Denial of service – uncontrolled prevention of service

Guard model

i. Guard enforcing access control : impenetrable wall with a door

ii. Authorization check : guard demands something it can check against its database

iii. Protected information : database of tamperproof information

iv. Decision procedure : mechanism for making a decision

v. SIMPLEST solution: complete isolation

1. Reject everything

Systems

i. Needham and Schroeder

1. Secret Key vs. Public key

2. On-line vs off-line

3. QUESTION: When use secret key?

a. When humans need to remember a key

b. When servers have little power

4. QUESTION: When use public key

a. When have a can distribute certificates out-of-band (e.g. with browser / OS)

b. When no simple trust relationship

5. QUESTION: How handle authentication between realms / domains?

a. Referrals

b. Certificate chains

ii. SFS

1. Self-certifying names encode hash of key in name

a. Client verifies that server has key corresponding to name

2. Key management (e.g. what keys a client should trust) separated from protocol

3. Client authentication separated from server authentication

4. Perfect forward secrecy of communication if client public key changes regularly

iii. Terra

1. QUESTION: What does it provide?

a. Assurance of what SW running on a remote computer

b. How provided? Each layer acts as AS for next upper layer, signs key + certificate

c. Remote system can verify certificate chain

2. QUESTION: What problems can it solve>?

a. Spyware?

b. Gaming?

c. Password sniffing terminals?

UW Global Navigation

University of Wisconsin-Madison

CS 736: Advanced Operating Systems

Basic Information

Notes

Menu

Page footer

Copyright