<span class="var-sub_title">SLURM User Group Meeting</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

SLURM User Group Meeting


Authors: Morris Jette (SchedMD LLC), Danny Auble (SchedMD LLC)

Abstract: Slurm is an open source workload manager used many on TOP500 systems and provides a rich set of features including topology aware optimized resource allocation, the ability to expand and shrink jobs on demand, the ability to power down idle nodes and restart them as needed, hierarchical bank accounts with fair-share job prioritization and many resource limits. The meeting will consist of three parts: The Slurm development team will present details about changes in the new version 18.08, describe the Slurm roadmap, and solicit user feedback. Everyone interested in Slurm use and/or development is encouraged to attend.

Long Description: Slurm is a free open source workload manager in widespread use today with a steadily growing customer base. As of the November 2017 TOP500 list, Slurm was used on 6 of the top 10 systems. Slurm is vendor-neutral with about 250 individual contributors from a multitude of computer vendors, national laboratories, and universities. SC is our best venue to gather such a diverse global community.

The Slurm BOF has been held at the previous six SC conferences with sizable attendance each year (approximately 45, 60, 80, 120, 170, 200, and 165 people in the previous seven meetings). The format has been similar in each year, developers presenting users with information about recent work and gathering requirements for future work.

The goals of the Slurm BOF are to inform users about recent developments, plans for future work, and gather requirements for future work. There is a major release of Slurm every nine months and the advances in each are substantial. SC is a great venue to keep the user community informed about these developments.

The first presentation will highlight changes in the Slurm version 18.08 to be released in August 2018, including support for * Heterogeneous job steps with respect to memory size, GPU use, etc. * Improved integration with Google cloud * GPU-centric resource allocations * Job requests for node features including exclusive-OR and exclusive-AND operands * An arbitrary number of backup daemons * Jobs that create or delete persistent burst buffers while using no compute resources (i.e. zero size) * Displaying more detailed burst buffer status information

A second presentation will highlight changes planned for future releases of Slurm, especially version 19.05 to be released in May 2019.

We also seek user guidance at the BOF and via a survey and discussion in order to help prioritize development for future work.

Survey questions: * Name * Organization * Email * Slurm user (yes or no) * Computer description (node counts and vendors) * Typical workload (job sizes and run times) * Current features that are most important to you * Additional features desired (priority ordered) * Interested in participating in Slurm consortium? * Other comments





Back to Birds of a Feather Archive Listing