DescriptionTuning and understanding the performance characteristics of computational fluid dynamics (CFD) codes on many-core, NUMA architectures is challenging. One must determine how programming choices impact algorithm performance and how best to utilize the available memory caches, high-bandwidth memory, and inter---and intra---node communication. Once collected, performance data must be translated into actionable code improvements. In addition, performance engineering experiments must be organized and tracked to quantify the benefit of any attempted tuning.
In the poster we present, examine and tune two CFD applications running on the IntelⓇ Xeon Phi™️ partition of a CrayⓇ XC 40/50 using TAU Commander and ParaTools ThreadSpotter. TAU Commander implements a streamlined, managed performance engineering workflow and highlights source regions limiting scalability through profiling and aggregate summary statistics. ParaTools ThreadSpotter analyzes an application as it is running and ranks individual performance problems. It provides a report annotating source code lines with actionable recommendations and quantifying performance metrics.