<span class="var-sub_title">Study of Performance Variability on Dragonfly Systems</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Women in HPC: Diversifying the HPC Community

Study of Performance Variability on Dragonfly Systems

Authors: Xin Wang (Illinois Institute of Technology)

Abstract: Dragonfly networks are being widely adopted in high-performance computing systems. On these networks, however, interference caused by resource sharing can lead to significant network congestion and performance variability. On a shared network, different job placement policies lead to different traffic distributions. Contiguous job placement policy achieves localized communication by assigning adjacent compute nodes to the same job. Random job placement policy, on the other hand, achieves balanced network traffic by placing application processes sparsely across the network to uniformly distribute the message load. Localized communication and balanced network traffic have opposite advantages and drawbacks. Localizing communication reduces the number of hops for message transfers at the cost of potential network congestion, while balancing network traffic reduces potential local congestion at the cost of increased message transfer hops.

In this study, we first present a comparative analysis exploring the trade-off between localizing communication and balancing network traffic using trace-based simulations, and demonstrate the effect of external network interference by introducing background traffic and show that localized communication can help reduce the application performance variation caused by network sharing. We then introduce an online simulation framework that improves performance and scalability, and discuss the validation of the simulation observations to a production Dragonfly system in respect of performance variability.

Archive Materials

Back to Women in HPC: Diversifying the HPC Community Archive Listing

Back to Full Workshop Archive Listing