Computer Sciences Dept.

Mary K. Vernon

Professor Emerita of Computer Science
                 
Picture of Mary K. Vernon
Quantitative System Design Interests:
Analytic system performance models that accurately predict measured system performance for alternative system design options, important next-generation commercial system design questions, quantitative design of near-optimal hardware and software, near-optimal enterprise storage systems, network traffic characterization, near-optimal network protocols, state-of-the-art parallel/distributed applications and architectures.

Research Summary

The theme of my research has been the development and application of analytic modeling techniques that enable the design of software, hardware, and communication system architectures with near-optimal performance and known performance properties. The contributions have included (1) customized equations that directly specify an optimal system design, (2) analytic bounds that quantify the opportunity for improvement as well as the target system performance, (3) customized analytic models that reflect the mechanics of the system and that accurately estimate measured system performance for various design options, and (4) application of the models to a wide variety of vital commercially-relevant system design questions, leading to the development of new near-optimal commercialy-adopted system designs that significantly outperform previous solutions.

The simplest possible analytic models that predict measured system performance readily expose all system features that inhibit system performance, including any performance bottlenecks that can be eliminated. Hence, the development of such models generally leads to significant new design insights as well as new near-optimal solutions that are often simpler to implement and significantly outperform previous designs. In addition, these analytic models typically require 1/10th - 1/100th the development time of accurate detailed system simulators, and frequently uncover previously unknown significant errors in a complex system (or simulator) implementation. Hence, such models are also highly valuable for validating complex system and simulator implementations.

The new modeling techniques developed together with students and colleagues include:

  • the Generalized Timed Petri Net (GTPN) which can be used to create models that predict measured system performance for bus arbitration protocols, cache coherence protocols, and other detailed system architectures that include operations that have deterministic durations,
  • Customized Approximate Mean Value Analysis (CMVA) which supports queueing-theoretic modeling at the accuracy of measured system performance for the design of complex high-performance computer/communication architectures/systems, in that the equations are developed "from scratch" to reflect the mechanics of the system operation, thus facilitating accurate extensions to capture the impact of a wide range of complex, nonseparable performance-relevant system features,
  • CMVA approximations for highly accurate modeling of complex system features, such as high variability in service times and the resulting highly bursty arrivals to downstream servers, pipelined interconnect message transfers with correlated blocking due to finite buffer space, in-order delivery of distributed responses to non-blocking memory requests, mean service time correlated with number of parallel requests in service, and requests coalescing with other requests for the same file while waiting for service.
  • A CMVA model for provisioning servers for very low client loss probability (e.g, 0.0001).
  • Customized high-fidelity LogGP models that predict measured system performance for the design of state-of-the-art applications runing on high-performance message-passing parallel architectures, such as an 8192-node Cray XT3/XT4 or a 500-node Condor system,
  • the LoPC model which supports the development of LogP models - without the "g" parameter - for estimating measured system performance in the design of state-of-the-art parallel applications running on high-performance shared-memory architectures, due to incorporating CMVA for accurately estimating the delay in application execution time due to contention (C) at the shared system resources,
  • Deterministic Task Graph Analysis for parallel algorithm design, together with the underlying theory that defines the broad system space for which this modeling technique is accurate,
  • Customized equations for mean response time of alternative parallel system job scheduling policies which compute the exact mean response time at particular points in a high-fidelity system architecture and workload characterization parameter space, and use interpolation functions to accurately estimate mean response time over the rest of the parameter space,
  • Customized lower bounds on server and network bandwidth required for on-demand streaming of popular media content for a variety of alternative settings, such as transmission path data rate, client packet loss rate, maximum client start-up delay, choose-your-own-ending content, heavy-tailed distributions of media content popularity, and so forth,
  • Customized models of Periodic Broadcast streaming protocols that can be used to compute the broadcast schedule that minimizes server bandwidth for a specified maximum client start-up delay, client packet loss recovery rate, and client data path transmission rate,
  • Customized linear programming models for directly determining an optimal content caching and delivery architecture for a specified client workload and near-optimal scalable streaming protocol, and
  • A Customized Model of TCP-Vegas throughput in the presence of packet loss that accurately predicts measured performance over a wide variety of Internet paths.
  • Customized models of differentiated service link scheduling policies for bottleneck links in the Internet.

We have used these techniques to obtain significant new system design insights and new, in many cases near-optimal, designs that significantly outperform previous solutions. Example design insights and new near-optimal system designs include:

  • new, simpler and near-optimal, commercially-adopted round-robin and first-come first-served bus arbitration protocols (pdf),
  • new, simpler and near-optimal hardware cache coherence protocols for a broad range of shared memory workloads, implemented in commercial cache controllers (pdf),
  • commercially-adopted coherence protocol support for a new Queue-on-Lock-Bit (QOLB) synchronization primitive,
  • the CMVA-based discovery and quantification of a previously unknown, significant processor throughput imbalance in commercial high-performance parallel architectures with k-ary n-cube processor/memory interconnections, due to a previously overlooked load-imbalance on the virtual channels used for deadlock-avoidance in Dally and Seitz' wormhole-routing protocol (pdf); this led Lethin and Dally to use a modified CMVA model in the design a new routing protocol for future interconnects,
  • commercially-adopted CMVA models to explore design trade-offs in high-performance shared memory architectures for processors with aggressive instruction-level parallelism (pdf),
  • optimized Cray UNICOS operating system semaphores,
  • a new "Priority Backfill" production parallel system job scheduling policy, implemented in the 1520-processor Origin 2000 system at NCSA for a 5-fold (80%) reduction in mean and 95-percentile job waiting time (pdf),
  • a new near-optimal task scheduling policy for Condor grids,
  • a new near-optimal state-of-the-art ATR parallel stochastic optimization code with a factor of 8 reduction in runtime on a 500-node Condor server farm (pdf),
  • high-performance parallel particle transport applications (and other wavefront computations) optimized for 8192-node (and larger) Cray XT3/XT4 systems (pdf),
  • new near-optimal Hierarchical Stream Mergiing and Bandwidth Skimming multicast streaming protocols with required server bandwidth that grows with the logarithm of the client request rate (pdf, pdf),
  • a new TCP-Madison Internet transport protocol that achieves a target upper-bound on bottleneck link utilization, near-optimal bottleneck link sharing, negligible packet loss and up to ten-fold higher throughput than FastTCP over a wide variety of Internet paths (pdf).

Current research includes the design of commercially-relevant enterprise storage systems, parallel/distributed applications, high performance network transport protocols, and network traffic characterization.

 
Computer Sciences | UW Home