<span class="var-sub_title">Error Analysis in HPC Applications Using Algorithmic Differentiation</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Women in HPC: Diversifying the HPC Community


Error Analysis in HPC Applications Using Algorithmic Differentiation

Authors: Harshitha Menon (Lawrence Livermore National Laboratory)

Abstract: Computer applications running on supercomputers are used to solve critical problems. These systems are expected to perform tasks not just quickly, but also correctly. Various factors that can affect correctness of programs include faults, reduced precision, lossy data reduction, iteration and truncation. In the presence of these errors, how do we know whether our program is producing correct results? I have developed a method to understand the impact of these errors on a computer program. The method employs algorithmic differentiation (AD) to analyze the sensitivity of the simulation output to errors in program variables. A tool that we developed based on this method evaluates a given computer program and identifies vulnerable regions that need to be protected from errors. We use this to selectively protect variables against Silent Data Corruptions (SDC). We also use this method to study floating point sensitivity of the code and develop mixed-precision configurations to achieve performance improvement without affecting accuracy. Using this tool we can ensure that the computer simulation applications give us the correct results in the presence of these errors, so that scientists and policy makers relying on these results can make accurate predictions that can have lasting impact.

Archive Materials


Back to Women in HPC: Diversifying the HPC Community Archive Listing

Back to Full Workshop Archive Listing