CS 736 – Spring 2006

CS 736 – Fall 2006

 

Midterm 2 Review

 

  1. Performance
    1. Papers:

                                              i.     AFS

                                             ii.     Frangipani / Petal

                                           iii.     Chubby

    1. General Problems

                                              i.     Latency: do things faster

1.   E.g. RPC turn around

                                             ii.     Throughput

1.   Handle more requests/operations per second

                                           iii.     Scale out:

1.   run on bigger data sets on more machines

2.   Handle more data on more computers / faster computers

3.   Run well on a cluster with a billion clients

    1. General Solutions

                                              i.     Locality      

                                             ii.     Partitioning – distribute load to multiple servers

1.   AFS

2.   Petal

3.   Frangipani

                                           iii.     Replication – more read throughput

1.   AFS

2.   Petal

                                           iv.     Caching

1.   AFS

2.   Frangipani

3.   Chubby

                                            v.     Change data structures

1.   Logging in Petal, Frangipani

                                           vi.     Batching – reduce startup costs

1.   Delayed write / group commit in LFS / Frangipani, NFS

                                         vii.     Callbacks / leases – reduce server load

1.   AFS

2.   Frangipani

3.   Chubby

                                        viii.     Move work to client

1.   AFS name translation

                                           ix.     Notifications – notify other participant of semantically interesting events

1.   AFS callbacks, Petal locks

                                            x.     Multi-level policy

1.   Petal global / physical maps

  1. Reliability
    1. Terms:

                                              i.     Fault = code bug

                                             ii.     Error = memory corrupted by that bug

                                           iii.     Failure = system misbehavior

    1. metrics

                                              i.     MTTF = mean time to failure == reliability

                                             ii.     MTTR = mean time to repair

                                           iii.     Availability = MTTF / (MTTF + MTTR), measured in 9s: 0.9, 0.99, 0.999

    1. General design principles

                                              i.     End-to-end design

1.   If code has to handle a problem anyway (middle layers canŐt completely solve a problem), it is a good place to handle the problem completely

                                             ii.     Recovery-oriented computing

1.   It may be easier to let things fail and recover quickly than to make them perfect

a.    Improve MTTR instead of MTTF

                                           iii.     Transactions

1.   Provide a general purpose error handling mechanism by abort

    1. Failure Models

                                              i.     Timing – miss a deadline

                                             ii.     Output – produce incorrect output

                                           iii.     Omission – skip an output

                                           iv.     Crash – skip an output, produce no more output

                                            v.     Byzantine

1.   Anything can happen, including malicious behaviors

    1. General Approaches

                                              i.     Fault Avoidance

1.   Prevention: make sure bugs never enter code

2.   Removal: Remove bugs from existing programs

3.   Work-around: donŐt execute buggy code

a.    E.g. Fire walls

                                             ii.     Fault Tolerance

1.   Redundancy – execute multiple times

2.   Diversity: multiple versions for deterministic bugs. Can also be diverse environment (change how memory allocated, scheduling works)

3.   Isolation: confine errors to a single component

4.   Modularity: keep components small

5.   Error detection: why important if have isolation?

a.    A: Needed for availability if doing wrong things

6.   Recovery

a.    Forwards / Backwards

b.    Concealing / revealing

7.   Where do you provide fault tolerance

a.    In the application?

b.    In a library

c.    In the OS

d.    Around a component (e.g. Nooks)

e.    In the HW

f.     If everything above layer X is identical, can tolerate faults at X or below automatically

g.    If have some diversity above X, can tolerate heisenbugs above layer X

    1. Systems

                                              i.     Process Pairs

1.   All work done in persistent transactions

2.   Process 1 sends request to its pair, tries to do work. On failure, transaction aborts and process 2 retries.

                                             ii.     Nooks

1.   Isolating device drivers

2.   Recovery by restart

                                           iii.     Recovery Oriented Computing

1.   FIG: inject system call failures to determine bugs in error handling

a.    Goal: gracefully handle failures

2.   Recursive restartability: allow an application to be partially rebooted

a.    Goal: improve MTTR

3.   Undo for Operators: log system state & mgmt operations, allow for rollback, repair, replay

a.    Goal: improve MTTR of management

                                           iv.     QuickSilver

1.   Goal: transactions to simplify error handling in a distributed system

2.   Optimized for fast cases: read only, volatile data

3.   Common TM, LM per system

  1. Security
    1. Key threats:

                                              i.     Privacy – uncontrolled release of sensitive information

                                             ii.     Integrity – uncontrolled change to sensitive information

                                           iii.     Denial of service – uncontrolled prevention of service

    1. Guard model

                                              i.     Guard enforcing access control : impenetrable wall with a door

                                             ii.     Authorization check : guard demands something it can check against its database

                                           iii.     Protected information : database of tamperproof information

                                           iv.     Decision procedure : mechanism for making a decision

                                            v.     SIMPLEST solution: complete isolation

1.   Reject everything

    1. Systems

                                              i.     Needham and Schroeder

1.   Secret Key vs. Public key

2.   On-line vs off-line

3.   QUESTION: When use secret key?

a.    When humans need to remember a key

b.    When servers have little power

4.   QUESTION: When use public key

a.    When have a can distribute certificates out-of-band (e.g. with browser / OS)

b.    When no simple trust relationship

5.   QUESTION: How handle authentication between realms / domains?

a.    Referrals

b.    Certificate chains

                                             ii.     SFS

1.   Self-certifying names encode hash of key in name

a.    Client verifies that server has key corresponding to name

2.   Key management (e.g. what keys a client should trust) separated from protocol

3.   Client authentication separated from server authentication

4.   Perfect forward secrecy of communication if client public key changes regularly

                                           iii.     Terra

1.   QUESTION: What does it provide?   

a.    Assurance of what SW running on a remote computer

b.    How provided? Each layer acts as AS for next upper layer, signs key + certificate

c.    Remote system can verify certificate chain

2.   QUESTION: What problems can it solve>?

a.    Spyware?

b.    Gaming?

c.    Password sniffing terminals?

                                                                                                    i.