search-icon
Paper
:
Evaluating and Accelerating High-Fidelity Error Injection for HPC
Event Type
Paper
Registration Categories
TP
Tags
Performance
Resiliency
Tools
TimeWednesday, November 14th2:30pm - 3pm
LocationC146
DescriptionWe address two important concerns in the analysis of the behavior of applications in the presence of hardware errors: (1) when is it important to model how hardware faults lead to erroneous values (instruction-level errors) with high fidelity, as opposed to using simple bit-flipping models, and (2) how to enable fast high-fidelity error injection campaigns, in particular when error detectors are employed. We present and verify a new nested Monte Carlo methodology for evaluating high-fidelity gate-level fault models and error-detector coverage, which is orders of magnitude faster than current approaches. We use that methodology to demonstrate that, without detectors, simple error models suffice for evaluating errors in 9 HPC benchmarks.
Archive
Back To Top Button