Challenges of Performance Portability for Fortran Unstructured Mesh Codes
TimeSunday, November 11th2:56pm - 2:58pm
DescriptionWhat pathways exist for Fortran performance portability to exascale? Fortran-based codes present different challenges for performance portability and productivity. Ideally, we want to develop one codebase that can run on many different HPC architectures. In reality, each architecture has its own idiosyncrasies, requiring architecture-specific code. Therefore, we strive to write code that is as portable as possible to minimize the amount of development and maintenance effort. This project investigates how different approaches to parallel optimization impact the performance portability for unstructured mesh Fortran codes. In addition, it explores the productivity challenges due to the software tool and compiler support limitations unique to Fortran. For this study, we use the Truchas software, a casting manufacturing simulation code, and develop initial ports for OpenMP CPU, OpenMP offload GPU. and CUDA for computational kernels. There is no CUDA Fortran compiler compatible with Truchas, it must rewrite kernel in CUDA C and have the interface linked for C function calls in Fortran. Meanwhile, only the IBM xlf compiler is supported for OpenMP offload GPU at this moment and it is still immature. In additional of the difficulty that Fortran brings, the unstructured mesh uses more complex data access patterns. From the analysis of the Truchas gradient calculation computational kernel, we show some success for performance and portability along with some issues unique to Fortran using unstructured mesh. Through this study, we hope to encourage users and venders to focus on the productive pathways to developing Fortran applications for exascale architectures.