Data Placement Optimization in GPU Memory Hierarchy Using Predictive Modeling
Parallel Programming Languages, Libraries, and Models
TimeSunday, November 11th12:14pm - 12:30pm
DescriptionModern supercomputers often use Graphic Processing Units (or GPUs) to meet the ever-growing demands for high performance computing. GPUs typically have a complex memory architecture with various types of memories and caches, such as global memory, shared memory, constant memory, and texture memory.The placement of data on these memories has a tremendous impact on the performance of the HPC applications and identifying the optimal placement location is non-trivial.
In this paper, we propose a machine learning-based approach to determine the best class of GPU memory that will minimize GPU kernel execution time. The machine learning process utilizes a set of performance counters obtained from profiling runs and combines with relevant hardware features to generate trained models. We evaluate our approach on several generations of NVIDIA GPUs, including Kepler, Maxwell, Pascal, and Volta on a set of benchmarks. The results show that the trained models achieve prediction accuracies over 90%.