High Performance Middlewares for Next Generation Architectures: Challenges and Solutions
TimeWednesday, November 14th8:30am - 5pm
DescriptionThe emergence of modern multi-/many-core architectures and high-performance interconnects have fueled the growth of large-scale supercomputing clusters. Due to this unprecedented growth in scale and compute density, high performance computing (HPC) middlewares now face a plethora of new challenges to solve in order to extract the best performance from such systems. In this work, we study four such challenges - a) launching and bootstrapping jobs on very large scale clusters, b) contention in collective communication, c) point-to-point communication protocols, and d) scalable fault-tolerance and recovery and propose efficient solutions for them. The proposed solutions have been implemented on MVAPICH2, a popular MPI and PGAS runtime used by scientists and HPC clusters around the world.