Performance Summary

CS 736 – Spring 2006

 

Lecture 20: Performance Summary

 

1.   General Problems

a.    Latency: do things faster

                                     i.     E.g. RPC turn around

b.    Throughput

                                     i.     Handle more requests/operations per second

c.    Time to completion

                                     i.     How long does it take to compute a fixed workload? E.g. sort a billion values

d.    Scale up: run faster on faster machines – e.g. giant multiprocessors with lots of memory and fast CPUs

                                     i.     Improve speed on faster/more computers

                                    ii.     Run well on a supercomputer

e.    Scale out: run on bigger data sets on more machines

                                     i.     Handle more data on more computers / faster computers

                                    ii.     Run well on a cluster with a billion clients

f.     Predictability

                                     i.     Does computer do as you expect? Is performance predictable, understandable, low variance? If there is a problem, can you understand its source?

g.    Fairness

                                     i.     E.g. proportionally share a resource

h.   Efficiency

                                     i.     Reduce the amount of CPU/bandwidth/storage it takes to do something, even if it isnŐt the bottleneck

                                    ii.     Frees resources for something else

i.      Overload

                                     i.     How does performance vary with load? Keep it even

2.   General Solutions

a.    Locality

                                     i.     FFS cylinder groups

                                    ii.     LFS logs

b.    Optimize for common case

                                     i.     LRPC

c.    Match underlying functionality

                                     i.     ActiveMessages: hardware messing

                                    ii.     Scheduler Activations: scheduling decisions

                                  iii.     Grapevine Naming – shows whether is user or group

d.    Hints - Semantically irrelevant but useful performance-wise if correct

                                     i.     Pilot page usage

e.    Partitioning – distribute load to multiple servers

                                     i.     Grapevine

                                    ii.     AFS

                                  iii.     Petal

                                  iv.     Frangipani

f.     Replication – more read throughput

                                     i.     Grapevine

                                    ii.     AFS

                                  iii.     Petal

g.    Caching

                                     i.     Grapevine – group membership

                                    ii.     AFS

                                  iii.     NFS

h.   Change data structures

                                     i.     Logging in LFS, Petal, Frangipani

                                    ii.     Message lists in Grapevine

                                  iii.     Free block bitmap in FFS

                                  iv.     A-Stack / E-stack in LRPC

                                   v.     Group membership in Grapevine

i.      Batching – reduce startup costs

                                     i.     Delayed write in LFS / Frangipani, NFS

j.     Randomization for fairness

                                     i.     Lottery Scheduling

k.    Idempotent operations / stateless operation

                                     i.     Message delivery in Grapevine

                                    ii.     NFS everything

l.      Callbacks / leases – reduce server load

                                     i.     AFS

                                    ii.     Frangipani

m. Move work to client

                                     i.     AFS name translation

n.   Early binding

                                     i.     NFS mounting

                                    ii.     LRPC binding / compiler-generated stubs

o.    Asynchronous operation

                                     i.     Active Messages

p.    Notifications – notify other participant of semantically interesting events

                                     i.     Scheduler Activations

q.    Move control from OS to user code

                                     i.     Scheduler activations

                                    ii.     Active Messages

r.     Delay work

                                     i.     LFS – segment cleaning

s.    Multi-level policy

                                     i.     FFS global/local placement

                                    ii.     Petal global / physical maps

3.   Evaluation Techniques

a.    Questions to ask

                                     i.     When should they be used

                                    ii.     What do they show

b.    Micro benchmarks

                                     i.     Used to understand performance problems – where is the speedup / slowdown / problem coming from

                                    ii.     E.g.

1.   Null RPC

2.   Contention

3.   Read 8 / 64 / 1000 kb files

c.    Synthetic benchmarks

                                     i.     Non-representative:

1.   Andrew Benchmark

2.   Shows higher level performance, more realistic mix of operations

3.   Again, used to understand performance, indicate potential problems due to workload skew

d.    Record live performance

                                     i.     Shows operational issues, not peak load

1.   CPU utilization

e.    Perform anomalous events, e.g. shutdown server

                                     i.     Show response under duress (e.g. time to reconfigure, time to clean)

f.     Comparisons

                                     i.     Against best research system (LRPC, Scheduler Activations)

                                    ii.     Against industry practice (AFS, FFS, LFS)

                                  iii.     Against tuned industry practices (Petal, Frangipani, ActiveMessages)

g.    Papers

                                     i.     LRPC:

1.   NULL RPC + component timings

2.   Throughput scaling on multiprocessor with simple workload

3.   Compare to TAOS RPC

                                    ii.     Scheduler Activations

1.   Micro benchmarks: null fork, wait

2.   Scalability with # of processors on single program

3.   Compare to unix threads, topaz fast threads, user level threads

                                  iii.     Active Messages

1.   Null RPC + timings

2.   Utilization as # of processors scale

3.   Compare to native buffered model

                                  iv.     Lottery Scheduling

1.   Proportionality with sharing under different simple (small # of processes) workloads)

                                   v.     FFS

1.   Read / write bandwidth, CPU utilization on simple workloads

2.   Compare to UFS

                                  vi.     LFS

1.   Synthetic, fixed workloads (e.g. uniform, fixed) to show response to different patterns

2.   Micro benchmarks for create/read/delete with sequential and random access

3.   Usage characteristics from live system

4.   Compare to FFS

                                vii.     AFS

1.   Usage characteristics from live usage

2.   Andrew benchmark – time + scalability as client load increases

3.   Access latency for different size files

4.   Compare to NFS, local

                               viii.     NFS

1.   Compare to local, network disk

2.   Run real programs

                                  ix.     Petal

1.   Compare to local, tuned industry FS

2.   Synthetic read / write workload

3.   Mesaure latency, scalability with # of servers

4.   Andrew benchmark

                                   x.     Frangipani

1.   Compare to local w/ tuned industry FS

2.   Andre wbenchmark

3.   Synthetic read/write microbench

4.   Scaling on microbench to understand perf