Multi-Level Memory and Storage for HPC and Data Analytics

Authors: Hans-Christian Hoppe (Intel Corporation), Michèle Weiland (University of Edinburgh), Kathryn Mohror (Lawrence Livermore National Laboratory)

Abstract: Recent progress in storage-class memory (SCM) technologies combined with emerging workloads mixing HPC simulations with big data processing or machine learning do create a perfect opportunity for innovative HPC systems embracing SCM. These will substantially improve delivered performance, scalability and energy efficiency for data-oriented HPC codes as well as “mixed” applications, and can play a large role in scaling the “I/O wall” when moving to exascale-class systems.

This BoF brings together technology providers, application and system SW developers, and system operators to discuss the evolving SCM landscape in the context of specific use cases, emerging technologies, and actual success stories.

Long Description: High performance computing has always had very challenging requirements on memory and storage subsystem performance, and the trend towards ever larger systems plus the combination of simulations with machine learning and other data analytics techniques increases the data volumes involved. At the same time, new memory and storage technologies are becoming available which bridge the traditional performance gap between random-access memory and (mainly) block access storage, and provide persistence of data. One common term used to refer to these is storage class memory (SCM). On top, advances in device integration indicate that concepts for compute in memory or compute near memory may be implemented in working silicon in the not too distant future.

The trends mentioned above now create a “perfect storm” for innovative HPC and high performance data analytics (HPDA) systems, which can combine multiple memory and storage levels to improve delivered performance, scalability and energy efficiency, address the much publicized “I/O wall” and meet the needs of the growing HPDA market. Compute in/near memory can also be a critical element for substantially reducing energy use for calculation operations. In addition to the device and system architecture layers, system SW will play a large role here, ensuring data and compute locality and managing data placement and migration.

Specific examples of memory technologies alluded to above include Intel/Micron 3D XPoint(TM), 3D stacked flash devices with up to 96 layers (Samsung, Toshiba-WD …), which are available as products, and announced plans of taking alternative technologies like ReRAM, PCM and MRAM to market. A key innovation of the non-Flash “true SCM” technologies is the ability to randomly address small units of information, down to cachelines and Bytes; this enables new usage modi (like making a large amount of persistent storage look like DRAM to a CPU) and allows avoiding costly serialization and block access management in the I/O stacks.

This session takes a holistic approach and brings together technology providers (such as Intel, Micron and Netapp), application and system SW developers (ECMWF, LLNL, BSC, ARM/Allinea), and system operators (EPCC, NERSC, TITECH). In the presentation section of the BoF, they will discuss the evolving memory/storage landscape in the context of specific use cases, available and emerging product technologies, and actual success stories. Points of view of typical technology and memory/storage system vendors, SW developers, end users and operators of large HPC/HPDA infrastructures will all be reflected.

In the discussion phase of the BoF, the presenters will first answer audience questions and then drive a conversation with the BoF participants on technology, SW interfaces, architecture of middleware layers, and use cases.

Back to Birds of a Feather Archive Listing