Portable and Reusable Deep Learning Infrastructure with Containers to Accelerate Cancer Studies
Parallel Programming Languages, Libraries, and Models
TimeMonday, November 12th3:30pm - 3:45pm
DescriptionAdvanced programming models, domain specific languages, and scripting toolkits have the potential to greatly accelerate the adoption of high performance computing. These complex software systems, however, are often difficult to install and maintain, especially on exotic high-end systems. We consider deep learning workflows used on petascale systems and redeployment on research clusters using containers. Containers are used to deploy the MPI-based infrastructure, but challenges in efficiency, usability, and complexity must be overcome. In this work, we address these challenges through enhancements to a unified workflow system that manages interaction with the container abstraction, the cluster scheduler, and the programming tools. We also report results from running the application on our system, harnessing 298~TFLOPS (single precision).