Is Data Placement Optimization Still Relevant on Newer GPUs?
Parallel Programming Languages, Libraries, and Models
TimeMonday, November 12th3:30pm - 4pm
DescriptionModern supercomputers often use Graphic Processing Units (or GPUs) to meet the evergrowing demands for energy efficient high performance computing. GPUs have a complex memory architecture with various types of memories and caches, such as global memory, shared memory, constant memory, and texture memory. Data placement optimization, i.e. optimizing the placement of data among these different memories, has a significant impact on the performance of HPC applications running on early generations of GPUs. However, newer generations of GPUs have new memory features. They also implement the same high-level memory hierarchy differently.
In this paper, we design a set of experiments to explore the relevance of data placement optimizations on several generations of NVIDIA GPUs, including Kepler, Maxwell, Pascal, and Volta. Our experiments include a set of memory microbenchmarks, CUDA kernels and a proxy application. The experiments are configured to include different CUDA thread blocks, data input sizes, and data placement choices. The results show that newer generations of GPUs are less sensitive to data placement optimization compared to older ones, mostly due to improvements to caches of the global memory.