<span class="var-sub_title">Collaboration Toward a Software Stack for System Power Optimization: The HPC PowerStack</span> SC18 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Collaboration Toward a Software Stack for System Power Optimization: The HPC PowerStack

Authors: Martin Schulz (Technical University Munich), Tapasya Patki (Lawrence Livermore National Laboratory), Jonathan Eastep (Intel Corporation), Masaaki Kondo (University of Tokyo), Siddhartha Jana (Energy Efficient HPC Working Group)

Abstract: This interactive BoF brings together vendors, labs, and academic researchers to discuss an emerging community effort to develop a software stack for system-wide power optimization. The HPC PowerStack effort is the first to identify what power optimization software actors are needed; how they interoperate to achieve stable, synchronized optimization; and how to glue together existing open source projects to engineer a cost-effective but cohesive, cross-platform power stack implementation.

This BoF disseminates key insights acquired in the project, provides prototyping status updates, invites audience feedback on current directions, brainstorms solutions to open questions, and issues a call-to-action to join the collaboration.

Long Description: Motivation and relevance:

While there exist several standalone efforts that attempt to tackle exascale power challenges, the majority of the implemented techniques have been designed to meet site-specific needs or optimization goals. Specifications such as PowerAPI and Redfish provide high-level power management interfaces for accessing power knobs. However, these stop short of defining which software components should actually be involved, and how should they interoperate with each other in a cohesive and coordinated stack. We believe coordination is critical for avoiding underutilization of system Watts and FLOPS.

This realization led to the formation of the PowerStack Community, in 2016. The charter of this community has been to (A) identify the key software actors needed in a system power stack: job-schedulers, application-level runtime, hardware knobs (B) their respective privileges, roles and responsibilities, (C) communication protocols for bidirectional control and feedback signals among them to enable scalable coordination at different granularities, (D) layering their access to privileged hardware controls and monitors to ensure stable closed-loop optimization, and (E) To study and combine existing ad-hoc engineering and development prototypes and build a community that actively participates in development and engineering efforts.

Pre-BoF Activities: In June’18, a group of 40 senior researchers, developers, and leaders from vendors, labs, and academia around the globe convened in Burghausen (Germany) for a face-to-face seminar. The community (representatives of all software stack layers), arrived at a consensus that (1) job/application-awareness is going to be critical for boosting system-wide optimization. This implies the need to drive interoperation between a job-level runtime and the job scheduler; (2) hierarchical control-systems provide a good model for scalable global optimization across the system, so the power stack should be a hierarchical system with bidirectional control and feedback signals flowing between the actors; (3) rather than providing layered access to privileged hardware knobs, today’s systems allow high-level actors like the job scheduler to directly access them. This breaks the hierarchical control-system model. Instead, access should really be routed / arbitrated through a job-level power manager in order to ensure a stable control system.

BoF Goals: While the seminar above made good progress towards aligning the community towards a common power stack, there are still open questions in the stack’s design. Some of them will be best answered through prototyping and experience gained from development of current state-of-the-art products. Also, since designing an entire stack from ground-up is a gargantuan effort, it is extremely important that the entire global HPC community is made aware of, and be willing to contribute towards this effort. Hence, this BoF.

The goals will be to: (1) make attendees aware of the emerging community effort to design a common power stack and discuss the lessons learned during the past seminar; (2) provide updates on the current and future prototyping efforts that have begun; and (3) align efforts across the community so that the SC18 BoF attendees reach a consensus with regards to sharing R&D resources, avoid duplicating effort, agree on common interfaces, and reap the rewards together as a community.

URL: https://powerstack.lrr.in.tum.de

Back to Birds of a Feather Archive Listing