Authors:
Abstract: This work presents a performance comparison of highly tuned matrix-free finite element kernels from the finite element library on different contemporary computer architectures, NVIDIA V100 and P100 GPUs, an Intel Knights Landing Xeon Phi, and two multi-core Intel CPUs (Broadwell and Skylake). The algorithms are based on fast integration on hexahedra using sum factorization techniques. For small problem sizes, when all data fits into CPU caches, Skylake is very competitive with Volta. For larger sizes, however, the GPU holds an advantage of approximately a factor of three over Skylake, because all architectures operate in the memory-bandwidth limited regime. A detailed performance analysis contrasts the throughput-oriented character of GPUs versus the more latency-optimized CPUs for the scenario of high-order finite element computations.
Best Poster Finalist (BP): no
Poster: pdf
Poster summary: PDF
Back to Poster Archive Listing