<span class="var-sub_title">Detection of Silent Data Corruptions in Smooth Particle Hydrodynamics Simulations</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Detection of Silent Data Corruptions in Smooth Particle Hydrodynamics Simulations

Authors: Aurélien Cavelan (University of Basel), Florina M. Ciorba (University of Basel), Ruben M. Cabezón (University of Basel)

Abstract: Soft errors, such as silent data corruptions (SDCs) hinder the correctness of large-scale scientific applications. Ghost replication (GR) is proposed herein as the first SDCs detector relying on the fast error propagation inherent to applications that employ the smooth particle hydrodynamics (SPH) method. GR follows a two-steps selective replication scheme. First, an algorithm selects which particles to replicate on a different process. Then, a different algorithm detects SDCs by comparing the data of the selected particles with the data of their ghost. The overhead and scalability of the proposed approach are assessed through a set of strong-scaling experiments conducted on a large HPC system under error-free conditions, using upwards of 3, 000 cores. The results show that GR achieves a recall and precision similar to that of full replication methods, at only a fraction of the cost, with detection rates of 91−99.9%, no false-positives, and an overhead of 1−10%.

Best Poster Finalist (BP): no

Poster: pdf
Poster summary: PDF
Reproducibility Description Appendix: PDF

Back to Poster Archive Listing