<span class="var-sub_title">Influence of A-Posteriori Subcell Limiting on Fault Frequency in Higher-Order DG Schemes</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS)


Influence of A-Posteriori Subcell Limiting on Fault Frequency in Higher-Order DG Schemes

Authors: Anne Reinarz (Technical University Munich)

Abstract: Soft error rates are increasing as modern architectures require increasingly small features at low voltages. Due to the large number of components used in HPC architectures, these are particularly vulnerable to soft errors. Hence, when designing applications that run for long time periods on large machines, algorithmic resilience must be taken into account. In this paper we analyse the inherent resiliency of a-posteriori limiting procedures in the context of the explicit ADER DG hyperbolic PDE solver ExaHyPE. The a-posteriori limiter checks element-local high-order DG solutions for physical admissibility, and can thus be expected to also detect hardware-induced errors. Algorithmically, it can be interpreted as element-local checkpointing and restarting of the solver with a more robust finite volume scheme on a fine subgrid. We show that the limiter indeed increases the resilience of the DG algorithm, detecting and correcting particularly those faults which would otherwise lead to a fatal failure.

Archive Materials


Back to Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS) Archive Listing

Back to Full Workshop Archive Listing