Presentation
SpotSDC: an Information Visualization System to Analyze Silent Data Corruption
SessionResearch Posters
Event Type
Poster
TP
EX
TimeThursday, November 15th8:30am - 5pm
LocationC2/3/4 Ballroom
DescriptionAggressive technology scaling trends are expected to make the hardware of HPC systems more susceptible to transient faults. Transient faults in hardware may be masked without affecting the program output, cause a program to crash, or lead to silent data corruptions (SDC). While fault injection studies can give an overall resiliency profile for an application, without a good visualization tool it is difficult to summarize and highlight critical information obtained. In this work, we design SpotSDC, a visualization system to analyze a program's resilience characteristics to SDC. SpotSDC provides an overview of the SDC impact on an application by highlighting regions of code that are most susceptible to SDC and will have a high impact on the program's output. SpotSDC also enables users to visualize the propagation of error through an application execution.
Archive