Authors: Gilad Shainer (Mellanox Technologies), Daniel Gruner (University of Toronto)
Abstract: The latest revolution in HPC is the effort around the co-design approach, a collaborative effort to reach exascale performance by taking a holistic system-level approach to fundamental performance improvements, is In-Network Computing. The CPU-centric approach has reached the limits of its scalability in several aspects, and In-Network Computing acting as “distributed co-processor” can handle and accelerates performance of various data algorithms, such as reductions and more. The session will cover the latest development of the InfiniBand roadmap and In-Network Computing technologies, and will include discussions and presentations from the first HDR 200G InfiniBand sites.
Long Description: The past focus for smart interconnects development was to offload the network functions from the CPU to the network. With the new efforts in the co-design approach, the new generation of smart interconnects will also offload data algorithms that will be managed within the network, allowing users to run these algorithms as the data being transferred within the system interconnect, rather than waiting for the data to reach the CPU. This technology is being referred to as In-Network Computing, which is the leading approach to achieve performance and scalability for Exascale systems. In-Network Computing transforms the data center interconnect to become a “distributed CPU”, and “distributed memory”, enables to overcome performance walls and to enable faster and more scalable data analysis.
The current EDR 100G and HDR 200G InfiniBand In-Network Computing technology includes several elements - Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), a technology that was developed by Oak Ridge National Laboratory and Mellanox and received the R&D100 award, smart Tag Matching and rendezvoused protocol, and more. These technologies are in use at some of the recent large scale supercomputers around the world, including the top TOP500 platforms.
The session will discuss the InfiniBand In-Network Computing technology and testing results from DoE systems, Canada’s fastest InfiniBand Dragonfly based supercomputer at the University of Toronto, and more.
At SC’18 we will see the first HDR 200G InfiniBand HPC and AI supercomputers, demonstrating the need for faster data speeds, and marking the very first systems operating at 200 gigabit per second speeds. The session will include the very first reports from these sites. At this time of submission, we cannot mention these sites or include specific names in the session proposal.
As the needs for faster data speed accelerates, the InfiniBand Trade Association has been working to set the goals for future speeds, and this topic will also be covered at the session.
Back to Birds of a Feather Archive Listing