Authors: Gilad Shainer (Mellanox Technologies), Jeff Kuehn (Los Alamos National Laboratory), Pavel Shamis (ARM Ltd), Dhabaleswar Panda (Ohio State University), Brad Benton (Advanced Micro Devices Inc)
Abstract: In order to exploit the capabilities of HPC systems, communication software needs to scale on millions of cores and support applications with adequate functionality to express their parallelism. UCX is a collaboration between industry, labs and academia that consolidates multiple technologies to provide a unified open-source framework. The UCX project is managed by the UCF consortium (http://www.ucfconsortium.org) and includes members from LANL, ORNL, ANL, Ohio State University, AMD, ARM, IBM, Mellanox, NVIDIA and more. The session will serve as the UCX community meeting and will introduce the latest development and specification to HPC developers and the broader user community.
Long Description: Modern HPC systems include extreme numbers of compute elements and extremely low-latency interconnection networks. In order to exploit the capabilities of these architectures and to meet their demands in scalability, communication software needs to scale and support applications with adequate functionality to express their parallelism. Moreover, communication software should add as little overhead as possible in order to avoid compromising the native performance of the interconnection network. These requirements make the design of high-performance communication software extremely intricate, since they demand minimal memory requirements and low instruction counts and cache activity while meeting stringent performance targets.
High-level programming models for communication (e.g., MPI, UPC, SHMEM) can be built on top of middleware, such as Portals, GASNet, UCCS, and ARMCI or use lower-level network-specific interfaces, often provided by the vendor. While the former offer high-level communication abstractions and portability across different systems, the latter offer proximity to the hardware and minimize overheads related to multiple software layers. An effort to combine the advantages of both is UCX, a communication framework for high-performance computing systems.
UCX is a collaboration between industry, national labs and academia that consolidates technologies from different organizations into a unified open source framework. The UCX consortium includes members from Los Alamos National Laboratory, Oak Ridge National Laboratory, Argonne National Laboratory, Ohio State University, AMD, ARM, IBM, Mellanox Technologies, NVIDIA and more. It is cross-platform and supports network technologies like InfiniBand, Cray Gemini/Aries, and shared memory architectures for x86-64, POWER, ARM, and GPUs. The UCX API exposes both high- and low-level interfaces to satisfy both better accessibility for exploratory programming model implementation and access to hardware optimizations for performance tuning. To satisfy this design goal, UCX is actually a unification of three separate APIs: UCP for high-level usability, UCT for low-level optimizations, and UCS as a utility glue layer. UCX has already been integrated with upstream of Open MPI project and OpenSHMEM, being used with MPICH and more, and used in some of the world’s leading large scale HPC systems worldwide already.
The session will serve as the UCX community meeting at SC’18, as well will introduce the latest development and UCX specification to the broader HPC developers and user community. It will enable a dialog on the future plans for UCX, review the latest performance results on different compute architectures, performed by different organizations. The session will also review the operations of the UCF consortium.
Back to Birds of a Feather Archive Listing