search-icon
Paper
:
Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers
Event Type
Paper
Registration Categories
TP
Tags
Algorithms
Applications
Architectures
Compiler Analysis and Optimization
Floating Point
Performance
Precision
Programming Systems
Tools
TimeWednesday, November 14th4pm - 4:30pm
LocationC141/143/149
DescriptionThe use of low-precision arithmetic in computing methods has been a powerful tool to accelerate numerous scientific computing applications including Artificial Intelligence. We present an investigation showing that other HPC applications can harness this power too, and in particular, the general HPC problem of solving Ax = b, where A is a large dense matrix, and the solution is needed in FP64 accuracy. Our approach is based on the mixed-precision (FP16->FP64) iterative refinement technique – we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly-tuned implementations where we show how the use of FP16-TC (tensor cores) arithmetic can provide up to 4X speedup and improve the energy consumption by a factor of 5 achieving 74 Gflop/Watt. This is due to the performance boost that the FP16 (Tensor Cores) provide and to its better accuracy that outperforms the classical FP16.
Back To Top Button