<span class="var-sub_title">Software Prefetching for Unstructured Mesh Applications</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

IA^3 2018: 8th Workshop on Irregular Applications: Architectures and Algorithms

Software Prefetching for Unstructured Mesh Applications

Authors: Ioan Hadade (Oxford Thermofluids Institute, University of Oxford)

Abstract: Applications that exhibit regular memory access patterns usually benefit transparently from hardware prefetchers that bring data into the fast on-chip cache just before it is required, thereby avoiding expensive cache misses. In contrast, unstructured mesh applications contain irregular access patterns that are often more difficult to identify in hardware. An alternative for such workloads is software prefetching, where special non-blocking instructions load data into the cache hierarchy. However, there are currently few examples in the literature on how to incorporate such software prefetches into existing applications with positive results.

This paper addresses these issues by demonstrating the utility and implementation of software prefetching in an unstructured finite volume CFD code of representative size and complexity to an industrial application and across a number of processors. We present the benefits of auto-tuning for finding the optimal prefetch distance values across different computational kernels and architectures and demonstrate the importance of choosing the right prefetch destination across the available cache levels for best performance. We discuss the impact of the data layout on the number of prefetch instructions required in kernels with indirect-access patterns and show how to integrate them on top of existing optimizations such as vectorization. Through this we show significant full application speed-ups on a range of processors, such as the Intel Xeon Skylake CPU (15%) as well as on the in-order Intel Xeon Phi Knights Corner (1.99X) architecture and the out-of-order Knights Landing (33%) many-core processor.

Archive Materials

Back to IA^3 2018: 8th Workshop on Irregular Applications: Architectures and Algorithms Archive Listing

Back to Full Workshop Archive Listing