Which Architecture Is Better Suited for Matrix-Free Finite-Element Algorithms: Intel Skylake or Nvidia Volta?
TimeThursday, November 15th8:30am - 5pm
DescriptionThis work presents a performance comparison of highly tuned matrix-free finite element kernels from the finite element library on different contemporary computer architectures, NVIDIA V100 and P100 GPUs, an Intel Knights Landing Xeon Phi, and two multi-core Intel CPUs (Broadwell and Skylake). The algorithms are based on fast integration on hexahedra using sum factorization techniques. For small problem sizes, when all data fits into CPU caches, Skylake is very competitive with Volta. For larger sizes, however, the GPU holds an advantage of approximately a factor of three over Skylake, because all architectures operate in the memory-bandwidth limited regime. A detailed performance analysis contrasts the throughput-oriented character of GPUs versus the more latency-optimized CPUs for the scenario of high-order finite element computations.