Back to index
Performance debugging for distributed systems of black boxes
Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, and Athicha Muthitacharoen
HP Labs, Palo Alto, Duke University, and MIT
Summary by: Zuyu Zhang
One-line Summary
Overview/Main Points
- Goal of performance debugging
- Find critial paths in distributed systems
- Black-box components
- Debug performance
- Input: traces of messages
- Output
- High-impact causal path patterns
- Delays at each node
- Nesting algorithm
- RPC-based request-response pairs
- Procedure
- pair messages with respect to callID, and find potential parents based on logical timestamps
- score potential causing nestings for each tuple ‘A, B, C, delay’
- choose unique parents by picking the hightest of adjusted scores and, for a child that has multiple parents, breaking ties by assigning the child to the earliest tied parent.
- generate causal paths and delays
- Could use to detect rare events
- Convolution algorithm
- free-form messages with only timestamps and sender and receiver identifiers
- Procedure
- aggregate messages and separate all traces into a set of per-edge traces
- treat each of per-edge traces as a time signal and use signal processing techniques to find causal relationships between signals by identifying spikes
Relevance
Flaws