Cooperative Rendezvous Protocols for Improved Performance and Overlap
State of the Practice
TimeWednesday, November 14th10:30am - 11am
DescriptionWith the emergence of larger multi-/many-core clusters, performance of large message communication is becoming more important. MPI libraries use different Rendezvous protocols to perform large message communication. However, existing Rendezvous protocols do not consider the overall communication pattern and make optimal use of the Sender and the Receiver CPUs. In this work, we propose a cooperative Rendezvous protocol that can provide up to 2x improvement in intra-node bandwidth and latency for large messages. We also propose a scheme to dynamically choose the best Rendezvous protocol for each message based on the communication pattern. Finally, we show how these improvements can increase the overlap of computation with intra-node and inter-node communication, and lead to application level benefits. We evaluate proposed designs on three different architectures including Intel Xeon, Knights Landing, and OpenPOWER with different HPC applications and obtain benefits up to 19% with Graph500, 16% with CoMD, and 10% with MiniGhost.