<span class="var-sub_title">Meeting HPC Container Challenges as a Community</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Meeting HPC Container Challenges as a Community

Authors: CJ Newburn (Nvidia Corporation), Andrew Younge (Sandia National Laboratories), Scot Schultz (Mellanox Technologies)

Abstract: Containers increase the likelihood users can run the latest version of HPC applications in their data center, because it's easier for administrators to install them and developers to deploy them. But we need to build and leverage an active community to solve remaining challenges.

The BoF presents, prioritizes, and gathers further input on top issues and budding solutions around containerization of HPC applications. Lightning talks nominated in ongoing community discussions describe the community's top issues, frame challenges, highlight progress and plans. Led discussions expand on the seeded issues, enumerate additional problems, and draw the audience into our ongoing community-wide forum.

Long Description: Containers increase the likelihood that users will be able to run the latest version of HPC applications in their data center, because administrators can install and maintain them with less effort, and because developers can deploy and support them more reliably and broadly. Containerization is a major force for good in HPC, particularly as HPC and the cloud crash together.

But we haven't yet arrived. Several challenges remain. This BoF will present, prioritize and gather further input on top issues, current efforts, and solutions around containerization of HPC applications. After a brief overview of what's happening with HPC containers and why they are such a strong attraction, there will be a set of lightning talks on top issues that have come out of ongoing community-wide discussions. Those quick introductions will frame the challenges, highlight progress and sketch out plans. With the foundation of a few such seminal issues, the remainder of the time will be spent in led discussions that expand on the seeded issues, enumerate additional problems, and gather further input on challenges and solutions.

To jump-start this effort, the BoF serves to share progress in ongoing community-wide collaborations that involve representatives of container technologies like Singularity and Docker, OSVs, network vendors, government labs, academic data centers, and OEMs. We already have a list of challenges that we're working through, and we have monthly working meetings to prioritize issues and organize progress.

One of the central goals of this BoF is to solidify and expand a framework that's already begun for gathering requirements, usage models, challenges and technical solutions related to HPC container technologies that is led by the community, rather than as a thrust of individual projects or efforts. This expanding community collaboration will help set forth an agenda and help prioritize collaborative work which helps improve the usage of containers in HPC and other related areas such as visualization, deep learning, and data science. As the formation of such a community takes careful curation of key vested members from users, developers, facilities, and vendors, this BoF looks to differentiate itself from other proposed BoFs regarding HPC containers and necessitates its own existence as such. This BoF would be complementary to other BoFs that focus on cloud, or that cover specific cloud and container offerings, rather than taking an approach that builds a community around a focus on challenges and technical solutions, independent of brands and offerings.

At ISC, there was a full-day workshop on this topic with 50-75 active participants and there have been other cloud and container-related BoFs in the past at Supercomputing that weren’t focused on communal efforts as this one is. At startup, the online discussion of HPC Containers began with about 25 active participants and it's growing. We expect BoF participation to be much higher, particularly as a new outreach to 40+ academic data centers rolls out in late summer.

URL: https://drive.google.com/open?id=1hlq0ZmK7r7-OLfjC0tmgunzuJM15FTzF

Back to Birds of a Feather Archive Listing