Tensor-Optimized Hardware Accelerates Fused Discontinuous Galerkin Simulations
TimeThursday, November 15th8:30am - 5pm
DescriptionIn recent years the compute/memory balance of processors has been continuously shifting towards compute. The rise of Deep Learning, based on matrix multiplications, accelerated this path, especially in terms of single precision and lower precision compute. An important research question is if this development can be leveraged for traditional HPC. We demonstrate that a high-order discontinuous Galerkin solver for seismic wave propagation can execute in single precision without loss of modeling accuracy. Additionally, we extended its kernels to support the Intel Knights Mill CPU with 14 TFLOPS of single precision deep-learning performance. This allows us to harvest the hardware’s special compute capabilities, even in an application with sparse linear algebra kernels. On cluster-level, Knights Mill can obtain the same application performance as the latest top-bin dual-socket Intel Xeon Platinum nodes, while consuming lower power. Compared to the HPC-focused Knights Landing processor, scenario-dependent speed-ups of up to 1.6× are possible.