Presentation
Framework for Scalable Intra-Node Collective Operations Using Shared Memory
Authors
Event Type
Paper
TP
Architectures
MPI
Networks
Performance
Programming Systems
State of the Practice
TimeWednesday, November 14th11am - 11:30am
LocationC140/142
DescriptionCollective operations are used in MPI programs to express common communication patterns, collective computations, or synchronizations. In many collectives, such as barrier or allreduce, the intra-node component of the collective is in the critical path, as the inter-node communication cannot start until the intra-node component has been executed. Thus, with increasing number of core counts in each node, intra-node optimizations that leverage the intra-node shared memory become increasingly important.
In this paper, we focus on the performance benefit of optimizing intra-node collectives using shared memory. We optimize several collectives using the primitives in broadcast and reduce as building blocks for other collectives. A comparison of our implementation on top of MPICH shows significant performance speedups with respect to the original MPICH implementation, MVAPICH, and OpenMPI, among others.
In this paper, we focus on the performance benefit of optimizing intra-node collectives using shared memory. We optimize several collectives using the primitives in broadcast and reduce as building blocks for other collectives. A comparison of our implementation on top of MPICH shows significant performance speedups with respect to the original MPICH implementation, MVAPICH, and OpenMPI, among others.
Download PDF
Archive