Compiler Optimization for Heterogeneous Locality and Homogeneous Parallelism in OpenCL and LLVM
TimeMonday, November 12th5pm - 5:07pm
DescriptionHeterogeneous platforms may include accelerators such as Digital Signal Processors (DSP’s) that employ SW-controlled scratch-pad memories instead of, or in addition to standard HW-cached memory. Controlling scratch-pads efficiently typically requires tiling and pipelining loops, thereby optimizing for memory locality rather than parallelism as a primary objective. On the other hand, achieving high performance on CPU’s and GPU’s typically requires optimizing for data-level parallelism as a primary objective, compromising locality. In this lightning talk, we show how OpenCL and LLVM can be used to achieve both target-dependent locality and target-independent parallelism. Such an approach facilitates the development of optimized software for DSP accelerators while enabling its efficient execution on standard servers. Following the work of Tian et al., our approach leverages automatic compiler optimization and relies purely on OpenCL, including its device-side enqueue capability and SPIR-V format.