<span class="var-sub_title">Tensor-Optimized Hardware Accelerates Fused Discontinuous Galerkin Simulations</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Tensor-Optimized Hardware Accelerates Fused Discontinuous Galerkin Simulations

Authors: Alexander Breuer (University of California, San Diego), Alexander Heinecke (Intel Corporation), Yifeng Cui (San Diego Supercomputer Center)

Abstract: In recent years the compute/memory balance of processors has been continuously shifting towards compute. The rise of Deep Learning, based on matrix multiplications, accelerated this path, especially in terms of single precision and lower precision compute. An important research question is if this development can be leveraged for traditional HPC. We demonstrate that a high-order discontinuous Galerkin solver for seismic wave propagation can execute in single precision without loss of modeling accuracy. Additionally, we extended its kernels to support the Intel Knights Mill CPU with 14 TFLOPS of single precision deep-learning performance. This allows us to harvest the hardware’s special compute capabilities, even in an application with sparse linear algebra kernels. On cluster-level, Knights Mill can obtain the same application performance as the latest top-bin dual-socket Intel Xeon Platinum nodes, while consuming lower power. Compared to the HPC-focused Knights Landing processor, scenario-dependent speed-ups of up to 1.6× are possible.

Best Poster Finalist (BP): no

Poster: pdf
Poster summary: PDF
Reproducibility Description Appendix: PDF

Back to Poster Archive Listing