Presentation
Containers, Collaboration, and Community: Hands-On Building a Data Science Environment for Users and Admins
Event Type
Tutorial
TUT
Clouds and Distributed Computing
Data Analytics
Data Management
System Administration
Tools
Workflows
TimeMonday, November 12th8:30am - 12pm
LocationC146
DescriptionIn this tutorial we combine best practices and lessons learned in evolving traditional HPC data centers at TACC, CU, and LSU into more integrated data science environments. In the first session, participants will learn about best practices in data science and software engineering and apply them while containerizing and scaling an MPI application using multiple container technologies across clouds, clusters, and a sandbox environment we will provide. They will then leverage continuous integration and delivery to increase the portability, visibility, and availability of their individual application codes. The first session will conclude with participants learning how the same approaches can be used by sysadmins to improve update and release velocities of their entire application catalogs while better balancing security and regression support concerns.
In the second session, participants will learn how to leverage automation and cloud services to better handle version and metadata management. From there, they will customize their own data science environment and gain hands-on experience using cloud services and the Agave Platform by extending their environment to publish and share applications, manage data, orchestrate simulations across both HPC and cloud resources, capture provenance information, and foster collaboration.
In the second session, participants will learn how to leverage automation and cloud services to better handle version and metadata management. From there, they will customize their own data science environment and gain hands-on experience using cloud services and the Agave Platform by extending their environment to publish and share applications, manage data, orchestrate simulations across both HPC and cloud resources, capture provenance information, and foster collaboration.