Compiler Optimization for Heterogeneous Locality and Homogeneous Parallelism in OpenCL and LLVM
Authors: Dorit Nuzman (Intel Corporation)
Abstract: Heterogeneous platforms may include accelerators such as Digital Signal Processors (DSP’s) that employ SW-controlled scratch-pad memories instead of, or in addition to standard HW-cached memory. Controlling scratch-pads efficiently typically requires tiling and pipelining loops, thereby optimizing for memory locality rather than parallelism as a primary objective. On the other hand, achieving high performance on CPU’s and GPU’s typically requires optimizing for data-level parallelism as a primary objective, compromising locality. In this lightning talk, we show how OpenCL and LLVM can be used to achieve both target-dependent locality and target-independent parallelism. Such an approach facilitates the development of optimized software for DSP accelerators while enabling its efficient execution on standard servers. Following the work of Tian et al., our approach leverages automatic compiler optimization and relies purely on OpenCL, including its device-side enqueue capability and SPIR-V format.
Back to LLVM-HPC2018: The Fifth Workshop on the LLVM Compiler Infrastructure in HPC Archive Listing
Back to Full Workshop Archive Listing