Shekhar Y. Borkar: Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation. IEEE Micro 25(6): 10-16 (2005). IEEE Xplore link |
Soft-errors single event upsets
Special radiation hardened circuits
architectural redundancy
localized ECC
Soft error rate
Silent Data corruption (SDC)
Detected unrecoverable error (DUE)
Architectural vulnerability factor > what effect does a soft error have on a program's actual output?
branch predictor's AVF = 0
ACE : Architecturally correct execution are bits whose errors cause damage to arch state
Use Little's law to calculate ave ACE (conservative) for a program
unACE sources
no-op
performance enhancing instructions
predicated false instructions
dynamically dead code
logically masked values
We translate Little’s law as N = B °— L , where N is the average number of bits in a processor structure, B is the average bandwidth of bits per cycle into the structure, and L is the average residence time of an individual bit in the structure.
Find AVFs using performance modeling and SPEC benchmarks