Authors:
Abstract: Aggressive technology scaling trends are expected to make the hardware of HPC systems more susceptible to transient faults. Transient faults in hardware may be masked without affecting the program output, cause a program to crash, or lead to silent data corruptions (SDC). While fault injection studies can give an overall resiliency profile for an application, without a good visualization tool it is difficult to summarize and highlight critical information obtained. In this work, we design SpotSDC, a visualization system to analyze a program's resilience characteristics to SDC. SpotSDC provides an overview of the SDC impact on an application by highlighting regions of code that are most susceptible to SDC and will have a high impact on the program's output. SpotSDC also enables users to visualize the propagation of error through an application execution.
Best Poster Finalist (BP): no
Poster: pdf
Poster summary: PDF
Back to Poster Archive Listing