<span class="var-sub_title">Which Architecture Is Better Suited for Matrix-Free Finite-Element Algorithms: Intel Skylake or Nvidia Volta?</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Which Architecture Is Better Suited for Matrix-Free Finite-Element Algorithms: Intel Skylake or Nvidia Volta?


Authors: Martin Kronbichler (Technical University Munich), Momme Allalen (Leibniz Supercomputing Centre), Martin Ohlerich (Leibniz Supercomputing Centre), Wolfgang A. Wall (Technical University Munich)

Abstract: This work presents a performance comparison of highly tuned matrix-free finite element kernels from the finite element library on different contemporary computer architectures, NVIDIA V100 and P100 GPUs, an Intel Knights Landing Xeon Phi, and two multi-core Intel CPUs (Broadwell and Skylake). The algorithms are based on fast integration on hexahedra using sum factorization techniques. For small problem sizes, when all data fits into CPU caches, Skylake is very competitive with Volta. For larger sizes, however, the GPU holds an advantage of approximately a factor of three over Skylake, because all architectures operate in the memory-bandwidth limited regime. A detailed performance analysis contrasts the throughput-oriented character of GPUs versus the more latency-optimized CPUs for the scenario of high-order finite element computations.

Best Poster Finalist (BP): no

Poster: pdf
Poster summary: PDF


Back to Poster Archive Listing