A Study of OpenMP Device Offloading in LLVM: Correctness and Consistency
TimeMonday, November 12th5:07pm - 5:14pm
DescriptionTo leverage widely available accelerators, OpenMP has introduced device constructs. Device constructs simplify the development of heterogeneous parallel programs and improve the performance. Many compilers including Clang already have support for device constructs, but there exist few documentations about the implementation details of device constructs. Lacking implementation details makes it cumbersome to understand the root cause of concurrency bugs and performance issues encountered on accelerators. In this paper, we conduct a study on Clang to analyze the implementation of device constructs for GPUs. We manually analyze the generated Parallel Thread Execution (PTX) code for each OpenMP construct to determine the relationship between the construct and PTX instructions. Based on the analysis, we evaluate the correctness of these constructs and discuss potential concurrency bugs incurred by incorrect usage of device constructs, for instance, data races, stale data and atomicity violation. Furthermore, we also talk about three observed inconsistencies in Clang, which may misinform programmers while writing an OpenMP program. Our work can help programmers gain a better understanding of device offloading and avoid hidden pitfalls when using Clang and OpenMP.