DescriptionThe emergence of modern multi-/many-cores has put more emphasis on optimizing intra-node communication. Existing designs in MPI libraries that work on the concept of distributed address spaces incur the overhead of intermediate memory copies to stage the data between processes. This can lead to severe performance degradation especially on emerging many-core architectures like Intel Skylake and IBM OpenPOWER. This work proposes a high-performance "shared address-space"-based MPI point-to-point and collective communication designs using XPMEM. We first characterize the bottlenecks associated with XPMEM based communication and propose new designs for efficient MPI large message communication. Then we propose novel collective designs that are contention-free and offer true zero-copy reduction operations. The proposed designs are evaluated on different multi-/many-core architectures using various micro-benchmarks and application kernels such as MiniAMR and AlexNet DNN training on CNTK. The proposed designs have shown significant performance improvement over state-of-the-art available in MPI libraries.